Skip to main content

Hacking our way through Astricon

This year I was speaking at Astricon for Truphone Labs (you can see my slides here if you're interested).

The week before Astricon I was invited try out respoke, a solution that allows you to build a WebRTC-based service.
They provide you with a client JavaScript library, and you need a server account (with an app ID and secret key) to connect your application to the respoke server and allow clients to communicate with each other.
This is an intuitive approach. As a service developer, you pay for the server usage, and you do so depending on how many concurrent clients you want.
I got a testing account, and started trying out the JS library. The process of building a new application was very straightforward, and the docs guided me towards building a simple app to make audio calls, and then video calls as well.

Soon I started thinking: respoke is from Digium, right, and Digium develops Asterisk. How is it possible that respoke and Asterisk cannot interconnect? What I’d like to do is place a call from web, and in some circumstances route it to a SIP client, or a PSTN line, or mobile phone.

It turned out that my expectation was quite justified: 36 hours before the beginning of the Astricon Hackathon, Digium announced chan_respoke, a new module for Asterisk (13) that allows Asterisk to connect as a respoke client and communicate with JS clients.

So that was the good news. The bad news was that we didn't have any time to prepare before the Hackathon, so we had about 8 hours to get up to speed and build something… sexy!
The other service that the Astricon Hackathon was encouraging to use was Clarify, which provides APIs to upload audio recording and is able to detect some specific “tag words” from the recordings.

Among the people discussing the formation of a team the most complete and compelling idea, and one that probably did require all 5 people working together, was GrannyCall. You can see some details here, with the list of team members.

GrannyCall was thought as a system for kids to call their granny (or daddy, mummy, etc.) from a simple web page, and get a score depending on how their vocabulary was appropriate and rich.
For example, we wanted to give some positive score for words like “love” or “cookie”, and perhaps a negative score for… well, you can guess some words that would score badly for a kid talking to his/her granny.
The project was quite ambitious, because the originating call would have been from a web page built with voxbone’s webrtc library, reach Asterisk over SIP, and then ring the granny on a web page built with the respoke client.

We used an Ubuntu VM from DigitalOcean to host Asterisk and the web servers (nginx) for the two web pages, and an external web server to interconnect to the Clarify APIs for uploading the recordings (with the desired tags).

Asterisk needed to be version 13, and chan_respoke was built and configured. The DigitalOcean box was on public IP so there wasn’t the need for any specific networking.

The part “kid to voxbone to asterisk” allowed for some preparation and went smoothly right after the time to build the web server and upload the client page.
While Asterisk was being built and configured, we built the granny web page with the respoke library. Again in this case it was quite easy and quick to have a call between two respoke clients, peer to peer, just to test the client application on the browsers.
The tricky part was originating the call from Asterisk to the granny web page, using chan_respoke.
For the sake of testing the connection and media establishment, we made some calls from the respoke client to Asterisk, hitting an announcement and an echo test. That worked almost immediately, and it was great!

Now it was the key moment: can we do the full flow (kid – Voxbone –asterisk – respoke – granny)? In terms of establishing the call, i.e. signalling, that worked too just after a few tweaks to chan_respoke’s configuration. But what about audio?

It turned out that there were some problems in the ICE negotiation between the respoke client and Asterisk: we had audio only in one direction. We were using Chrome at that moment, and moving to Firefox didn’t help, so we did think there could be possibly a bug in the libraries.
Considering the maturity of the libraries, this looked completely understandable, and the respoke guys spent a lot of time helping us investigating the problem and trying to find a proper solution before the submission deadline (this resulted in a patch on the server side being applied the next hours).

Honestly, I was happy that the Hackaton was scheduled for only a relatively short time as eight hours: would it had been any longer, we probably wouldn’t have dinner or had a proper sleep (and the 8 (or 9) hours jet lag was not particularly helpful!).

At submission time, we didn't have two-way audio. Also the debugging ate precious time to prepare the presentation to the judges, and this could be the reason why we weren't awarded any prize. Honestly, given the intensity of the effort and the complexity of the project, I was hoping for at least an honorable mention, but I hope we can gather again the same team in a different occasion and bring different results!

Jokes apart, it’s been an extremely useful experience. No documentation or remote communication can replace the live interaction and working on a proof of concept – in particular if you have a crazy deadline and the body full of caffeine and sugar (and a few hours' sleep in the last 36 hours).

Of course we took some shortcuts, like removing any firewall from the host, use a common linux user, authenticated via password and not SSH keys, edited files in place, etc. We did those things knowing they weren't best practices but aiming to complete a proof of concept as quickly as possible.

The takeaway from all this is very simple: if you’re developing a new technology, or a new solution oriented to developers, do whatever you can to involve the developers in a productive, challenging way. Hackathons represent a great solution, even if confined within a company, department or team. The excitement and the feedback (and debugging) you'll help to generate will have a tremendous value.

Popular posts from this blog

Troubleshooting TURN

  WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. I t dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data. The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.

Extracting RTP streams from network captures

I needed an efficient way to programmatically extract RTP streams from a network capture. In addition I wanted to: save each stream into a separate pcap file. extract SRTP-negotiated keys if present and available in the trace, associating them to the related RTP (or SRTP if the negotiation succeeded) stream. Some caveats: In normal conditions the negotiation of SRTP sessions happens via a secure transport, typically SIP over TLS, so the exchanged crypto information may not be available from a simple network capture. There are ways to extract RTP streams using Wireshark or tcpdump; it’s not necessary to do it programmatically. All this said I wrote a small tool ( https://github.com/giavac/pcap_tool ) that parses a network capture and tries to interpret each packet as either RTP/SRTP or SIP, and does two main things: save each detected RTP/SRTP stream into a dedicated pcap file, which name contains the related SSRC. print a summary of the crypto information exchanged, if available. With ...

Testing SIP platforms and pjsip

There are various levels of testing, from unit to component, from integration to end-to-end, not to mention performance testing and fuzzing. When developing or maintaining Real Time Communications (RTC or VoIP) systems,  all these levels (with the exclusion maybe of unit testing) are made easier by applications explicitly designed for this, like sipp . sipp has a deep focus on performance testing, or using a simpler term, load testing. Some of its features allow to fine tune properties like call rate, call duration, simulate packet loss, ramp up traffic, etc. In practical terms though once you have the flexibility to generate SIP signalling to negotiate sessions and RTP streams, you can use sipp for functional testing too. sipp can act as an entity generating a call, or receiving a call, which makes it suitable to surround the system under test and simulate its interactions with the real world. What sipp does can be generalised: we want to be able to simulate the real world tha...