Tuesday 11 June 2013

VoIP calls encoded with SILK: from RTP to WAV


SILK is a codec defined by Skype, but can be found in many VoIP clients, like CSipSimple.
It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz).
Since Wireshark doesn't allow you to decode an RTP stream carrying SILK frames, I was curious to find a programmatic way to do it. In fact, this has also allowed to me to earn a "tumbleweed" badge in stackoverflow.
You may argue that a Wireshark plugin would be the right solution, but that's probably for another day.

Initially I thought it was sufficient to read the specification for RTP payload when using SILK; the truth is that I had to reverse engineer a solution by looking at SILK SDK's test vectors.
There, I discovered that a file containing SILK audio doesn't have the file header indicated in the IETF draft ("!#SILK"), but a slightly different one ("!#SILK_V3").

More importantly, each encoded frame is not preceded by a block header, but by two bytes specifying its length.
Given these findings, it was a matter of extracting the RTP payload for each packet.

In Wireshark, I've selected the RTP stream I wanted to decode and exported it as raw binary.

A problem was that SILK doesn't have a fixed length to represent an audio frame. By using libpcap (libpcap0.8 on Debian squeeze) though, I could simply loop on the list of packets, read each length, subtract the packet header length and retrieve the exact payload length. I did this in C, but any other libpcap implementation (e.g. for python or perl) would do. Using libpcap is not strictly necessary, but helps, in particular when padding is involved.

Once the bitstream was ready, I decoded it in raw PCM format with the decoder available in the SILK SDK (downloadable from here). I knew the original encoded audio was at 24 KHz and 20 msec/frame, which also happen to be the decoder's default settings.

$ ./decoder ~/silk_from_rtp.bit ~/silk_from_rtp.raw

From the raw PCM to a WAV, handy to play on any PC, the step is easy and sox does the job. I just had to specify the sample rate, again of course 24 KHz, the encoding (16 bit unsigned, little-endian), and that was it!

$ sox -V -t raw -b 16 -e signed-integer -r 24000 silk_from_rtp.raw silk_from_rtp.wav

UPDATE (11/9/2014): The SILK SDK (and dev.skype.com) has disappeared. If you want to download it, try this (I used version 1.0.9).

25 comments:

  1. looks like we are not alone who work on it :)

    ReplyDelete
  2. Excellent - thanks for this cool utility. BTW, I don't understand why 56 bytes in the following line:

    long payload_len = header->len - 56;

    can you kindly clarify - thanks

    ReplyDelete
  3. Hi mntr0609,
    I was taking some time to answer to your question but I haven't found the opportunity to answer as precisely as I wanted.
    A suboptimal answer is that I've empirically observed that the payload size was always 56 bytes smaller than the packet size declared in the header by the library in use.
    I was expecting 40 bytes (12 for RTP, 8 for UDP, 20 for IP), but observed 56 and eventually used that value.
    I'll try to update with a better answer (and probably update the code too with a define, so that the intention is clearer).
    Thanks for commenting.

    ReplyDelete
  4. thanks. I was trying to decode a SILK bit stream from wireshark. Thought the 56 bytes come from including the 14 byte Ethernet headers and the 2 byte silk payload length and wanted to confirm.
    for some reason, when I look at the RTP packet in the wireshark that contains a SILK frame, the 2 byte frame length (that comes right after the 12 byte RTP header) do not correspond to the length of the frame. I see bigger values in the 2 bytes. Also, the stream doesn't have the "#!SILK_V3" magic number which I added on top of your code to create and write it the first time we open the output file.

    ReplyDelete
    Replies
    1. Hi mntr0609, you were right in reporting this. There were indeed 2 additional bytes that were not needed. I fixed it now, and tested again successfully with 3rd party traces. Cheers.

      Delete
  5. > "when I look at the RTP packet in the wireshark that contains a SILK frame, the 2 byte frame length (that comes right after the 12 byte RTP header) do not correspond to the length of the frame. I see bigger values in the 2 bytes."

    Can you give an example of the length returned by wireshark, the lenght written in the .bit file and the actual length of the audio block?

    This is what I saw for example in one case ("wireshark length" -> "subtracting 56" -> "written in the .bit"):
    88 -> 32 -> 20 00
    93 -> 37 -> 25 00
    93 -> 37 -> 25 00
    96 -> 40 -> 28 00

    And yes, I added manually the magic number at the beginning of the file before starting adding the audio blocks. I'll add that to my code, together with a few improvements.

    Thanks.

    ReplyDelete
  6. I've pushed a new version of silk_rtp_to_bitstream on github.

    ReplyDelete
  7. Thanks for your code!
    Can someone please send me a SILK wireshark trace? I tried to decode my own wireshark trace but the result was noise so i suspect that something is wrong with mt trace (maybe the packets are encoded..)
    Thanks in advance.

    ReplyDelete
  8. Hi Giacomo Vacca,

    I am also working on the similar problem. I tried with your code. I got pcap file using wireshark but dont have luck.
    I tried again with my own code to get silk stream payloads like:
    #!SILK_V3 + 1st payloadLenth(2bytes)+ 1st Payload + 2st payloadLenth+ 2nd Payload + ....

    but still I am getting noise I unable to play. Please help me I can share the file. Please drop me a mail: battula97@gmail.com

    Thanks,
    Venkat

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. i am trying to get audio from skype for busine call,i have packets captured from wireshark , i followed your blog and all the brains i have , but i get noise , no voice , i think skype for business is not using Silk encoding ,but in MS documentation it says its using Silk

    ReplyDelete
  11. I have found a mistake in the original version (full header len was 56, but the correct value is 54). Please check out latest https://github.com/giavac/silk_rtp_to_bitstream and try again. Cheers.

    ReplyDelete
    Replies
    1. Thanks Giacomo, its working fine with your updated code. Its really interesting working with silk codec. Thanks for your support and blog & knowledge sharing :)

      -Venkat B

      Delete
    2. Is it too late to say Thanks? :)

      Delete
  12. I wrote a summary here: http://www.giacomovacca.com/2017/01/voip-calls-encoded-with-silk-from-rtp.html

    ReplyDelete
  13. Hello Giacomo,

    I tried to download the "SILK SDK", but it seems the link is unavailable.

    Do you have any other source that we may download it?

    Thank you,

    Gilmar Silva

    ReplyDelete
    Replies
    1. Hi Gilmar,
      unfortunately they have unpublished that source code.

      Would you please check if this works for you? https://github.com/collects/silk

      I've had a quick look and it may be it.

      Best,
      Giacomo

      Delete
  14. The code worked and I was able to decode the ".bit" file. Thank you.
    Unfortunately, I only received noise after converting to WAV.
    I am researching how to lead with that behavior.

    ReplyDelete
  15. Would you still happen to have a sample pcap of this kicking around?

    Thanks,

    ReplyDelete
  16. Would you still happen to have a sample pcap of this kicking around?

    Thanks,

    ReplyDelete
    Replies
    1. Do you want to try with this pcap file?
      https://drive.google.com/file/d/1CaZ75fKQtwpUZXqagzo-2zBhUC4rl56-/view?usp=sharing

      Giacomo

      Delete
  17. Hello! I would want to offer a enormous thumbs up to the great info you might have here with this post. We are returning to your site for additional soon. VoIP Service Atlanta

    ReplyDelete
  18. hi
    for a univercity project my teacher said me generate a rtp pcap silk codec
    can you say me how can i generate a rtp pcap file that audio codec with silk??

    ReplyDelete
  19. Great work! I saw Zoom uses SILK for ringtones, these files (c:\Program Files\Zoom\bin\ringtone\Ukulele.pcm) start with the #!SILK_V3 magic header. I tried decoding them to a bitstream with the windows equivalent tooling you linked: silk_v3_decoder.exe Ukulele.pcm Ukulele.bit. The output is a file with 960 SOH characters. My goal is to have custom ringtones in Zoom via the reverse process. Thanks!

    ReplyDelete
  20. I wanted to thank you for this excellent read!! I definitely loved every little bit of it. I have you bookmarked your site to check out the new stuff you post. best VoIP business plans.

    ReplyDelete

Wireshark setting to interpret UDP as RTP automatically

 Before I forget again, a Wireshark setting that can help saving time by trying to interpret any UDP as RTP, if possible: Analyze --> Ena...