Tuesday, 11 June 2013

VoIP calls encoded with SILK: from RTP to WAV


SILK is a codec defined by Skype, but can be found in many VoIP clients, like CSipSimple.
It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz).
Since Wireshark doesn't allow you to decode an RTP stream carrying SILK frames, I was curious to find a programmatic way to do it. In fact, this has also allowed to me to earn a "tumbleweed" badge in stackoverflow.
You may argue that a Wireshark plugin would be the right solution, but that's probably for another day.

Initially I thought it was sufficient to read the specification for RTP payload when using SILK; the truth is that I had to reverse engineer a solution by looking at SILK SDK's test vectors.
There, I discovered that a file containing SILK audio doesn't have the file header indicated in the IETF draft ("!#SILK"), but a slightly different one ("!#SILK_V3").

More importantly, each encoded frame is not preceded by a block header, but by two bytes specifying its length.
Given these findings, it was a matter of extracting the RTP payload for each packet.

In Wireshark, I've selected the RTP stream I wanted to decode and exported it as raw binary.

A problem was that SILK doesn't have a fixed length to represent an audio frame. By using libpcap (libpcap0.8 on Debian squeeze) though, I could simply loop on the list of packets, read each length, subtract the packet header length and retrieve the exact payload length. I did this in C, but any other libpcap implementation (e.g. for python or perl) would do. Using libpcap is not strictly necessary, but helps, in particular when padding is involved.

Once the bitstream was ready, I decoded it in raw PCM format with the decoder available in the SILK SDK (downloadable from here). I knew the original encoded audio was at 24 KHz and 20 msec/frame, which also happen to be the decoder's default settings.

$ ./decoder ~/silk_from_rtp.bit ~/silk_from_rtp.raw

From the raw PCM to a WAV, handy to play on any PC, the step is easy and sox does the job. I just had to specify the sample rate, again of course 24 KHz, the encoding (16 bit unsigned, little-endian), and that was it!

$ sox -V -t raw -b 16 -e signed-integer -r 24000 silk_from_rtp.raw silk_from_rtp.wav

UPDATE (11/9/2014): The SILK SDK (and dev.skype.com) has disappeared. If you want to download it, try this (I used version 1.0.9).

14 comments:

  1. looks like we are not alone who work on it :)

    ReplyDelete
  2. Excellent - thanks for this cool utility. BTW, I don't understand why 56 bytes in the following line:

    long payload_len = header->len - 56;

    can you kindly clarify - thanks

    ReplyDelete
  3. Hi mntr0609,
    I was taking some time to answer to your question but I haven't found the opportunity to answer as precisely as I wanted.
    A suboptimal answer is that I've empirically observed that the payload size was always 56 bytes smaller than the packet size declared in the header by the library in use.
    I was expecting 40 bytes (12 for RTP, 8 for UDP, 20 for IP), but observed 56 and eventually used that value.
    I'll try to update with a better answer (and probably update the code too with a define, so that the intention is clearer).
    Thanks for commenting.

    ReplyDelete
  4. thanks. I was trying to decode a SILK bit stream from wireshark. Thought the 56 bytes come from including the 14 byte Ethernet headers and the 2 byte silk payload length and wanted to confirm.
    for some reason, when I look at the RTP packet in the wireshark that contains a SILK frame, the 2 byte frame length (that comes right after the 12 byte RTP header) do not correspond to the length of the frame. I see bigger values in the 2 bytes. Also, the stream doesn't have the "#!SILK_V3" magic number which I added on top of your code to create and write it the first time we open the output file.

    ReplyDelete
    Replies
    1. Hi mntr0609, you were right in reporting this. There were indeed 2 additional bytes that were not needed. I fixed it now, and tested again successfully with 3rd party traces. Cheers.

      Delete
  5. > "when I look at the RTP packet in the wireshark that contains a SILK frame, the 2 byte frame length (that comes right after the 12 byte RTP header) do not correspond to the length of the frame. I see bigger values in the 2 bytes."

    Can you give an example of the length returned by wireshark, the lenght written in the .bit file and the actual length of the audio block?

    This is what I saw for example in one case ("wireshark length" -> "subtracting 56" -> "written in the .bit"):
    88 -> 32 -> 20 00
    93 -> 37 -> 25 00
    93 -> 37 -> 25 00
    96 -> 40 -> 28 00

    And yes, I added manually the magic number at the beginning of the file before starting adding the audio blocks. I'll add that to my code, together with a few improvements.

    Thanks.

    ReplyDelete
  6. I've pushed a new version of silk_rtp_to_bitstream on github.

    ReplyDelete
  7. Thanks for your code!
    Can someone please send me a SILK wireshark trace? I tried to decode my own wireshark trace but the result was noise so i suspect that something is wrong with mt trace (maybe the packets are encoded..)
    Thanks in advance.

    ReplyDelete
  8. Hi Giacomo Vacca,

    I am also working on the similar problem. I tried with your code. I got pcap file using wireshark but dont have luck.
    I tried again with my own code to get silk stream payloads like:
    #!SILK_V3 + 1st payloadLenth(2bytes)+ 1st Payload + 2st payloadLenth+ 2nd Payload + ....

    but still I am getting noise I unable to play. Please help me I can share the file. Please drop me a mail: battula97@gmail.com

    Thanks,
    Venkat

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. i am trying to get audio from skype for busine call,i have packets captured from wireshark , i followed your blog and all the brains i have , but i get noise , no voice , i think skype for business is not using Silk encoding ,but in MS documentation it says its using Silk

    ReplyDelete
  11. I have found a mistake in the original version (full header len was 56, but the correct value is 54). Please check out latest https://github.com/giavac/silk_rtp_to_bitstream and try again. Cheers.

    ReplyDelete
    Replies
    1. Thanks Giacomo, its working fine with your updated code. Its really interesting working with silk codec. Thanks for your support and blog & knowledge sharing :)

      -Venkat B

      Delete
  12. I wrote a summary here: http://www.giacomovacca.com/2017/01/voip-calls-encoded-with-silk-from-rtp.html

    ReplyDelete