SILK is a codec defined by Skype, but can be found in many VoIP clients, like CSipSimple.
It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz).
Since Wireshark doesn't allow you to decode an RTP stream carrying SILK frames, I was curious to find a programmatic way to do it. In fact, this has also allowed to me to earn a "tumbleweed" badge in stackoverflow.
You may argue that a Wireshark plugin would be the right solution, but that's probably for another day.
Initially I thought it was sufficient to read the
specification for RTP payload when using SILK; the truth is
that I had to reverse engineer a solution by looking at SILK SDK's test
vectors.
There, I discovered that a file containing SILK audio
doesn't have the file header indicated in the IETF draft ("!#SILK"), but a
slightly different one ("!#SILK_V3").
More importantly, each encoded frame is not preceded by a block header, but by two bytes specifying its length.
Given these findings, it was a matter of extracting the RTP payload for each packet.
In Wireshark, I've selected the RTP stream I wanted to decode and exported it as raw binary.
A problem was that SILK doesn't have a fixed length
to represent an audio frame. By using libpcap (libpcap0.8 on Debian squeeze) though, I could
simply loop on the list of packets, read each length, subtract the
packet header length and retrieve the exact payload length. I did this in C,
but any other libpcap implementation (e.g. for python or perl) would do. Using libpcap is not strictly necessary, but helps,
in particular when padding is involved.
Once the bitstream was ready, I decoded it in raw PCM
format with the decoder available in the SILK SDK (downloadable from here). I knew the
original encoded audio was at 24 KHz and 20 msec/frame, which also
happen to be the decoder's default settings.
$ ./decoder ~/silk_from_rtp.bit ~/silk_from_rtp.raw
From the raw PCM to a WAV, handy to play on any PC, the
step is easy and sox does the job. I just had to specify the
sample rate, again of course 24 KHz, the encoding (16 bit unsigned,
little-endian), and that was it!
$ sox -V -t raw -b 16 -e signed-integer -r 24000 silk_from_rtp.raw silk_from_rtp.wav
UPDATE (11/9/2014): The SILK SDK (and dev.skype.com) has disappeared. If you want to download it, try this (I used version 1.0.9).
looks like we are not alone who work on it :)
ReplyDeleteExcellent - thanks for this cool utility. BTW, I don't understand why 56 bytes in the following line:
ReplyDeletelong payload_len = header->len - 56;
can you kindly clarify - thanks
Hi mntr0609,
ReplyDeleteI was taking some time to answer to your question but I haven't found the opportunity to answer as precisely as I wanted.
A suboptimal answer is that I've empirically observed that the payload size was always 56 bytes smaller than the packet size declared in the header by the library in use.
I was expecting 40 bytes (12 for RTP, 8 for UDP, 20 for IP), but observed 56 and eventually used that value.
I'll try to update with a better answer (and probably update the code too with a define, so that the intention is clearer).
Thanks for commenting.
thanks. I was trying to decode a SILK bit stream from wireshark. Thought the 56 bytes come from including the 14 byte Ethernet headers and the 2 byte silk payload length and wanted to confirm.
ReplyDeletefor some reason, when I look at the RTP packet in the wireshark that contains a SILK frame, the 2 byte frame length (that comes right after the 12 byte RTP header) do not correspond to the length of the frame. I see bigger values in the 2 bytes. Also, the stream doesn't have the "#!SILK_V3" magic number which I added on top of your code to create and write it the first time we open the output file.
Hi mntr0609, you were right in reporting this. There were indeed 2 additional bytes that were not needed. I fixed it now, and tested again successfully with 3rd party traces. Cheers.
Delete> "when I look at the RTP packet in the wireshark that contains a SILK frame, the 2 byte frame length (that comes right after the 12 byte RTP header) do not correspond to the length of the frame. I see bigger values in the 2 bytes."
ReplyDeleteCan you give an example of the length returned by wireshark, the lenght written in the .bit file and the actual length of the audio block?
This is what I saw for example in one case ("wireshark length" -> "subtracting 56" -> "written in the .bit"):
88 -> 32 -> 20 00
93 -> 37 -> 25 00
93 -> 37 -> 25 00
96 -> 40 -> 28 00
And yes, I added manually the magic number at the beginning of the file before starting adding the audio blocks. I'll add that to my code, together with a few improvements.
Thanks.
I've pushed a new version of silk_rtp_to_bitstream on github.
ReplyDeleteThanks for your code!
ReplyDeleteCan someone please send me a SILK wireshark trace? I tried to decode my own wireshark trace but the result was noise so i suspect that something is wrong with mt trace (maybe the packets are encoded..)
Thanks in advance.
Hi Giacomo Vacca,
ReplyDeleteI am also working on the similar problem. I tried with your code. I got pcap file using wireshark but dont have luck.
I tried again with my own code to get silk stream payloads like:
#!SILK_V3 + 1st payloadLenth(2bytes)+ 1st Payload + 2st payloadLenth+ 2nd Payload + ....
but still I am getting noise I unable to play. Please help me I can share the file. Please drop me a mail: battula97@gmail.com
Thanks,
Venkat
This comment has been removed by the author.
ReplyDeletei am trying to get audio from skype for busine call,i have packets captured from wireshark , i followed your blog and all the brains i have , but i get noise , no voice , i think skype for business is not using Silk encoding ,but in MS documentation it says its using Silk
ReplyDeleteI have found a mistake in the original version (full header len was 56, but the correct value is 54). Please check out latest https://github.com/giavac/silk_rtp_to_bitstream and try again. Cheers.
ReplyDeleteThanks Giacomo, its working fine with your updated code. Its really interesting working with silk codec. Thanks for your support and blog & knowledge sharing :)
Delete-Venkat B
Is it too late to say Thanks? :)
DeleteI wrote a summary here: http://www.giacomovacca.com/2017/01/voip-calls-encoded-with-silk-from-rtp.html
ReplyDeleteHello Giacomo,
ReplyDeleteI tried to download the "SILK SDK", but it seems the link is unavailable.
Do you have any other source that we may download it?
Thank you,
Gilmar Silva
Hi Gilmar,
Deleteunfortunately they have unpublished that source code.
Would you please check if this works for you? https://github.com/collects/silk
I've had a quick look and it may be it.
Best,
Giacomo
The code worked and I was able to decode the ".bit" file. Thank you.
ReplyDeleteUnfortunately, I only received noise after converting to WAV.
I am researching how to lead with that behavior.
Would you still happen to have a sample pcap of this kicking around?
ReplyDeleteThanks,
Would you still happen to have a sample pcap of this kicking around?
ReplyDeleteThanks,
Do you want to try with this pcap file?
Deletehttps://drive.google.com/file/d/1CaZ75fKQtwpUZXqagzo-2zBhUC4rl56-/view?usp=sharing
Giacomo
Hello! I would want to offer a enormous thumbs up to the great info you might have here with this post. We are returning to your site for additional soon. VoIP Service Atlanta
ReplyDeletehi
ReplyDeletefor a univercity project my teacher said me generate a rtp pcap silk codec
can you say me how can i generate a rtp pcap file that audio codec with silk??
Great work! I saw Zoom uses SILK for ringtones, these files (c:\Program Files\Zoom\bin\ringtone\Ukulele.pcm) start with the #!SILK_V3 magic header. I tried decoding them to a bitstream with the windows equivalent tooling you linked: silk_v3_decoder.exe Ukulele.pcm Ukulele.bit. The output is a file with 960 SOH characters. My goal is to have custom ringtones in Zoom via the reverse process. Thanks!
ReplyDeleteI wanted to thank you for this excellent read!! I definitely loved every little bit of it. I have you bookmarked your site to check out the new stuff you post. best VoIP business plans.
ReplyDelete