Verto is a newly designed signalling protocol for WebRTC
clients interacting with FreeSWITCH. It has an intuitive, JSON-based RPC which
allows clients to exchange SDP offers and answers with FreeSWITCH over a
WebSocket (and Secure WebSockets are supported). It’s available right now with
the 1.4 stable version (1.4.14 at the moment of writing).
The feature I like the most is “verto.attach”: when a client
has an active bridge on FreeSWITCH and, for any reason (e.g. a tab refresh) it
disconnects, upon reconnection FreeSWITCH automatically re-offers the session
SDP and allows the client to immediately reattach to the existing session. I
have not seen this implemented in other places and find it extremely useful. I’ve
noticed recently that this does not fully work yet when the media is bypassed
(e.g. on a verto-verto call), but Anthony Minnesale, on the FreeSWITCH dev
mailing list said this feature is still a work in progress, so I’m keeping an
eye on it.
Initially I was expecting an integrated solution for endpoint
localization, i.e. what a SIP registrar can do to allow routing a call to the
right application server. On second thoughts I don’t think this is a problem
and there are ways to gather on which FreeSWITCH instance an endpoint is
connected, and then route a call to it.
Once a verto endpoint hits the dialplan, it can call other
verto endpoints or even SIP endpoints/gateways. I’ve also verified that verto
clients can join conference rooms inside FreeSWITCH, and this is not only
possible but can be done for conferences involving SIP endpoints as well,
transparently.
This brings me to what I think it’s the strongest
proposition of verto: interoperability with SIP.
In my opinion WebRTC is an enormous opportunity, and a
technology that will revolutionize communications over Internet. WebRTC has
been designed with peer-to-peer in mind, and this is the right way to go,
however if you want to interoperate with VoIP (either directly or as a gateway
to PSTN and GSM) you can’t ignore SIP.
I’m not worried about Web-to-Web calls:
there are already many solutions out there, and each day there’s something new.
Many new signalling protocols are being designed, since WebRTC standardization,
on purpose, hasn't mandated any specific protocol for signalling. Verto is a viable solution
when on the other side you have SIP.
I've been experimenting on this for some time now. In August I presented a solution for WebRTC/SIP interoperation, based on Kamailio andFreeSWITCH, at ClueCon. In that case signalling was accomplished with SIP on
both sides (using the JsSIP library on the clients); unsurprisingly, after
using verto, SIP on the web browser client side looks even more redundant, over-complex,
but most of all with a steeper learning curve for web developers, and this is
becoming every day a stronger selling point for new signalling protocols for
WebRTC applications.
Web browsers running on laptops can easily manage multiple
media streams incoming from a multi-party call. This is not true for
applications running on mobile devices or gateways: they prefer a single media
stream for each “conference call”, for resource optimization and typical lack
of support respectively (1). Verto-SIP can represent a solution to bridge the
web/multistream world with the VoIP/monostream one, for example by having the
participants inside a conference room.
When video is involved though, things get as usual more complicated.
WebRTC applications can benefit from managing one video stream per call
participant, and a web page can present the many video streams in many
ways.
But this can easily become too cumbersome for applications on mobile
devices. We need to be able to send one single audio stream and video stream.
And whilst the audio streams are “easy” to multiplex, how do you do that for
video? Do you stream only the video from the active speaker (as FreeSWITCH does
by default on conferences), or do you build a video stream with one video box
per participant? The Jitsi VideoBridge is a clever solution leveraging a
multi-stream approach, but again, how about applications running on mobile
devices?
For what concerns signalling interoperation/federation there
is an interesting analysis at the Matrix project blog. The experience gathered
last Friday when hacking Matrix/SIP interoperability through verto/FreeSWITCH
has also shown some key points about ICE negotiation: I recommend reading it.
My view is that there are two key points that will allow a
solution to be successful in the field of Web-based communications involving “traditional”
Internet telephony but also mobile
applications:
- Interoperability with SIP.
- The ability to provide one single media stream per application/gateway, should they require it.
What do you think?
(1) Yes, I know that nothing prevents a SIP client
to manage multiple streams, but practically speaking it’s not common.