Skip to main content

VoIP calls encoded with SILK: from RTP to WAV


SILK is a codec defined by Skype, but can be found in many VoIP clients, like CSipSimple.
It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz).
Since Wireshark doesn't allow you to decode an RTP stream carrying SILK frames, I was curious to find a programmatic way to do it. In fact, this has also allowed to me to earn a "tumbleweed" badge in stackoverflow.
You may argue that a Wireshark plugin would be the right solution, but that's probably for another day.

Initially I thought it was sufficient to read the specification for RTP payload when using SILK; the truth is that I had to reverse engineer a solution by looking at SILK SDK's test vectors.
There, I discovered that a file containing SILK audio doesn't have the file header indicated in the IETF draft ("!#SILK"), but a slightly different one ("!#SILK_V3").

More importantly, each encoded frame is not preceded by a block header, but by two bytes specifying its length.
Given these findings, it was a matter of extracting the RTP payload for each packet.

In Wireshark, I've selected the RTP stream I wanted to decode and exported it as raw binary.

A problem was that SILK doesn't have a fixed length to represent an audio frame. By using libpcap (libpcap0.8 on Debian squeeze) though, I could simply loop on the list of packets, read each length, subtract the packet header length and retrieve the exact payload length. I did this in C, but any other libpcap implementation (e.g. for python or perl) would do. Using libpcap is not strictly necessary, but helps, in particular when padding is involved.

Once the bitstream was ready, I decoded it in raw PCM format with the decoder available in the SILK SDK (downloadable from here). I knew the original encoded audio was at 24 KHz and 20 msec/frame, which also happen to be the decoder's default settings.

$ ./decoder ~/silk_from_rtp.bit ~/silk_from_rtp.raw

From the raw PCM to a WAV, handy to play on any PC, the step is easy and sox does the job. I just had to specify the sample rate, again of course 24 KHz, the encoding (16 bit unsigned, little-endian), and that was it!

$ sox -V -t raw -b 16 -e signed-integer -r 24000 silk_from_rtp.raw silk_from_rtp.wav

UPDATE (11/9/2014): The SILK SDK (and dev.skype.com) has disappeared. If you want to download it, try this (I used version 1.0.9).

Popular posts from this blog

Troubleshooting TURN

  WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. I t dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data. The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.

Extracting RTP streams from network captures

I needed an efficient way to programmatically extract RTP streams from a network capture. In addition I wanted to: save each stream into a separate pcap file. extract SRTP-negotiated keys if present and available in the trace, associating them to the related RTP (or SRTP if the negotiation succeeded) stream. Some caveats: In normal conditions the negotiation of SRTP sessions happens via a secure transport, typically SIP over TLS, so the exchanged crypto information may not be available from a simple network capture. There are ways to extract RTP streams using Wireshark or tcpdump; it’s not necessary to do it programmatically. All this said I wrote a small tool ( https://github.com/giavac/pcap_tool ) that parses a network capture and tries to interpret each packet as either RTP/SRTP or SIP, and does two main things: save each detected RTP/SRTP stream into a dedicated pcap file, which name contains the related SSRC. print a summary of the crypto information exchanged, if available. With ...

Testing SIP platforms and pjsip

There are various levels of testing, from unit to component, from integration to end-to-end, not to mention performance testing and fuzzing. When developing or maintaining Real Time Communications (RTC or VoIP) systems,  all these levels (with the exclusion maybe of unit testing) are made easier by applications explicitly designed for this, like sipp . sipp has a deep focus on performance testing, or using a simpler term, load testing. Some of its features allow to fine tune properties like call rate, call duration, simulate packet loss, ramp up traffic, etc. In practical terms though once you have the flexibility to generate SIP signalling to negotiate sessions and RTP streams, you can use sipp for functional testing too. sipp can act as an entity generating a call, or receiving a call, which makes it suitable to surround the system under test and simulate its interactions with the real world. What sipp does can be generalised: we want to be able to simulate the real world tha...