SILK is a codec defined by Skype, but can be found in many VoIP clients, like
CSipSimple.
It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz).
Since
Wireshark doesn't allow you to decode an RTP stream carrying SILK
frames, I was curious to find a programmatic way to do it. In fact, this
has also allowed to me to earn a "tumbleweed" badge in
stackoverflow.
You may argue that a Wireshark plugin would be the right solution, but that's probably for another day.
Initially I thought it was sufficient to read the
specification for
RTP payload when using SILK; the truth is
that I had to reverse engineer a solution by looking at SILK SDK's test
vectors.
There, I discovered that a file containing SILK audio
doesn't have the file header indicated in the IETF draft ("!#SILK"), but a
slightly different one ("!#SILK_V3").
More importantly, each encoded frame is not preceded by a block header, but by two bytes specifying its length.
Given these findings, it was a matter of extracting the RTP payload for each packet.
In Wireshark, I've selected the RTP stream I wanted to decode and exported it as raw binary.
A problem was that SILK doesn't have a fixed length
to represent an audio frame. By using libpcap (libpcap0.8 on Debian squeeze) though, I could
simply loop on the list of packets, read each length, subtract the
packet header length and retrieve the exact payload length.
I did this in C,
but any other libpcap implementation (e.g. for python or perl) would do. Using libpcap is not strictly necessary, but helps,
in particular when padding is involved.
Once the bitstream was ready, I decoded it in raw PCM
format with the decoder available in the SILK SDK (
downloadable from here). I knew the
original encoded audio was at 24 KHz and 20 msec/frame, which also
happen to be the decoder's default settings.
$ ./decoder ~/silk_from_rtp.bit ~/silk_from_rtp.raw
From the raw PCM to a WAV, handy to play on any PC, the
step is easy and
sox does the job. I just had to specify the
sample rate, again of course 24 KHz, the encoding (16 bit unsigned,
little-endian), and that was it!
$ sox -V -t raw -b 16 -e signed-integer -r 24000 silk_from_rtp.raw silk_from_rtp.wav
UPDATE (11/9/2014): The SILK SDK (and dev.skype.com) has disappeared. If you want to download it, try this (I used version 1.0.9).