Skip to main content

Bridging WebRTC and SIP with verto


Verto is a newly designed signalling protocol for WebRTC clients interacting with FreeSWITCH. It has an intuitive, JSON-based RPC which allows clients to exchange SDP offers and answers with FreeSWITCH over a WebSocket (and Secure WebSockets are supported). It’s available right now with the 1.4 stable version (1.4.14 at the moment of writing).

The feature I like the most is “verto.attach”: when a client has an active bridge on FreeSWITCH and, for any reason (e.g. a tab refresh) it disconnects, upon reconnection FreeSWITCH automatically re-offers the session SDP and allows the client to immediately reattach to the existing session. I have not seen this implemented in other places and find it extremely useful. I’ve noticed recently that this does not fully work yet when the media is bypassed (e.g. on a verto-verto call), but Anthony Minnesale, on the FreeSWITCH dev mailing list said this feature is still a work in progress, so I’m keeping an eye on it.

Initially I was expecting an integrated solution for endpoint localization, i.e. what a SIP registrar can do to allow routing a call to the right application server. On second thoughts I don’t think this is a problem and there are ways to gather on which FreeSWITCH instance an endpoint is connected, and then route a call to it.

Once a verto endpoint hits the dialplan, it can call other verto endpoints or even SIP endpoints/gateways. I’ve also verified that verto clients can join conference rooms inside FreeSWITCH, and this is not only possible but can be done for conferences involving SIP endpoints as well, transparently.

This brings me to what I think it’s the strongest proposition of verto: interoperability with SIP.

In my opinion WebRTC is an enormous opportunity, and a technology that will revolutionize communications over Internet. WebRTC has been designed with peer-to-peer in mind, and this is the right way to go, however if you want to interoperate with VoIP (either directly or as a gateway to PSTN and GSM) you can’t ignore SIP.

I’m not worried about Web-to-Web calls: there are already many solutions out there, and each day there’s something new. Many new signalling protocols are being designed, since WebRTC standardization, on purpose, hasn't mandated any specific protocol for signalling. Verto is a viable solution when on the other side you have SIP.

I've been experimenting on this for some time now. In August I presented a solution for WebRTC/SIP interoperation, based on Kamailio andFreeSWITCH, at ClueCon. In that case signalling was accomplished with SIP on both sides (using the JsSIP library on the clients); unsurprisingly, after using verto, SIP on the web browser client side looks even more redundant, over-complex, but most of all with a steeper learning curve for web developers, and this is becoming every day a stronger selling point for new signalling protocols for WebRTC applications.

Web browsers running on laptops can easily manage multiple media streams incoming from a multi-party call. This is not true for applications running on mobile devices or gateways: they prefer a single media stream for each “conference call”, for resource optimization and typical lack of support respectively (1). Verto-SIP can represent a solution to bridge the web/multistream world with the VoIP/monostream one, for example by having the participants inside a conference room.

When video is involved though, things get as usual more complicated. WebRTC applications can benefit from managing one video stream per call participant, and a web page can present the many video streams in many ways.

But this can easily become too cumbersome for applications on mobile devices. We need to be able to send one single audio stream and video stream. And whilst the audio streams are “easy” to multiplex, how do you do that for video? Do you stream only the video from the active speaker (as FreeSWITCH does by default on conferences), or do you build a video stream with one video box per participant? The Jitsi VideoBridge is a clever solution leveraging a multi-stream approach, but again, how about applications running on mobile devices?

For what concerns signalling interoperation/federation there is an interesting analysis at the Matrix project blog. The experience gathered last Friday when hacking Matrix/SIP interoperability through verto/FreeSWITCH has also shown some key points about ICE negotiation: I recommend reading it.

My view is that there are two key points that will allow a solution to be successful in the field of Web-based communications involving “traditional” Internet telephony but also mobile applications:

  1. Interoperability with SIP.
  2.  The ability to provide one single media stream per application/gateway, should they require it.


What do you think?


(1)   Yes, I know that nothing prevents a SIP client to manage multiple streams, but practically speaking it’s not common.

Popular posts from this blog

Troubleshooting TURN

  WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. I t dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data. The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.

Extracting RTP streams from network captures

I needed an efficient way to programmatically extract RTP streams from a network capture. In addition I wanted to: save each stream into a separate pcap file. extract SRTP-negotiated keys if present and available in the trace, associating them to the related RTP (or SRTP if the negotiation succeeded) stream. Some caveats: In normal conditions the negotiation of SRTP sessions happens via a secure transport, typically SIP over TLS, so the exchanged crypto information may not be available from a simple network capture. There are ways to extract RTP streams using Wireshark or tcpdump; it’s not necessary to do it programmatically. All this said I wrote a small tool ( https://github.com/giavac/pcap_tool ) that parses a network capture and tries to interpret each packet as either RTP/SRTP or SIP, and does two main things: save each detected RTP/SRTP stream into a dedicated pcap file, which name contains the related SSRC. print a summary of the crypto information exchanged, if available. With ...

Testing SIP platforms and pjsip

There are various levels of testing, from unit to component, from integration to end-to-end, not to mention performance testing and fuzzing. When developing or maintaining Real Time Communications (RTC or VoIP) systems,  all these levels (with the exclusion maybe of unit testing) are made easier by applications explicitly designed for this, like sipp . sipp has a deep focus on performance testing, or using a simpler term, load testing. Some of its features allow to fine tune properties like call rate, call duration, simulate packet loss, ramp up traffic, etc. In practical terms though once you have the flexibility to generate SIP signalling to negotiate sessions and RTP streams, you can use sipp for functional testing too. sipp can act as an entity generating a call, or receiving a call, which makes it suitable to surround the system under test and simulate its interactions with the real world. What sipp does can be generalised: we want to be able to simulate the real world tha...