Giacomo Vacca

Saturday, 29 October 2016

Opus/G.711 Transcoding For The Practical Man

Following my earlier post on "Opus SDP negotiation" in the series "For The Practical Man", I'm presenting today a related topic: Opus audio codec when transcoding is involved.

Most of the providers of PSTN connectivity require the simplest possible VoIP codec: G.711 (which comes in two flavours, u-law and a-law).

G.711 is a sort of PCM encoding at 8000 samples per second: 8000 times per second an audio sample is encoded with 8 bit. Sometimes Comfort Noise can be used, reducing the bitrate when silence is detected, but otherwise the typical working principle is a continuous flow of digitally-encoded packets of voice. u-law and a-law just use a different way to encode the data. (If you're curious about what does silence look like in G.711, I wrote a post about it some time ago).

G.729 is another widely adopted codec, but I'll leave it for another day.

With 8000 samples per second and 8 bit dedicated for each sample G.711 requires a net bit rate of 64 Kbps. "Net" because each packet will travel over the IP network with the IP/UDP overhead (40 Bytes, accounting for an additional 16 Kbps at 20 msec packetization). And this in each direction.

Except in cases where Comfort Noise is used, G.711 is not able to "recover" latency, so every little delay during the network transport will either sum up with the overall delay, or cause the packet to be dropped, should not arrive in time for the receiving jitter buffer to correctly handle it.

I hope I'm passing the message that G.711 is expensive and inefficient.

Today though G.711 is a sort of common denominator between communicating parties. It's the "last resort", if you will. In fact it's been chosen by the standardization committee working on WebRTC, together with the Opus codec, for audio. WebRTC requires support for these two audio codecs (see RFC 7875, chapter 3).

Opus is a much better solution. It's suitable for VoIP (low bitrate, small sample rates) but also for music (very high bitrate, large sample rates). It's flexible enough to adapt the bitrate during a call. When Variable Bitrate is enabled, Opus tries to optimize the bitrate given some network conditions (e.g. packet loss). If there's silence, the Opus encoder keeps sending packets at the packet rate required, but makes them smaller. As a drastic measure, Opus can be instructed to perform DTX (Discontinuous Transmission): silence is not sent at all.

Furthermore, Opus has an error correction mechanism: FEC (Forward Error Correction). FEC is a technique by which a packet of encoded audio doesn't simply contain the encoded audio at the desired quality for the current packet, but also a low bitrate encoding of the previous packet. This means that if packet N is lost, it's possible for the decoder to generate a lower quality approximation thanks to the FEC information received at packet N+1. Of course there's a price to pay: FEC information consumes bandwidth without improving quality! It's a redundancy system which advantages apply only in case of packet loss.

The other caveat is that given FEC works only on contiguous frames, if two consecutive packets are lost, then the decoder doesn't have any FEC information to reconstruct one of them. That data is lost forever.

What comes handy in this cases is a different technique: PLC (Packet Loss Conciliation). PLC is able to reconstruct missing packets by interpolation, and so works also when more than one packet is lost.

What has an impact in error recovering is also the size of the packets. Considering the overhead required to transmit a packet of voice over an IP network, "grouping" longer intervals of sound will allow a smaller bandwidth usage. A stream with packets representing 40 ms will require a smaller bandwidth than the same audio with packets representing 20 msec. The drawback here is that it's more "traumatic" to lose a bigger packet, so a compromise must be found.

Now, coming back to G.711 and Opus, there's one case where they are not just mutually exclusive, but work together. This happens when you would like to use Opus given its advantages, but you must use G.711 for interoperability reasons. Enter "transcoding". One entity, like a mobile app dealing with 3G (or worse), uses Opus, but the call has to be routed to a "traditional" GSM or landline phone, and the provider of this routing mandates G.711. It can be assumed that the network link between you, the service provider of the mobile app, and the PSTN provider, has properties that allow G.711 streams to be handled. Bandwidth efficiency won't be great but hey, this is life.

Transcoding means that one stream has to be decoded from one codec (e.g. from Opus) and then re-encoded with the other codec (e.g. G.711).

This operation is computationally demanding. And if you compare it with a scenario where the encoded packets just flow directly without the need to be touched, you can get how transcoding is not desirable. It probably reduces any system's capacity of at least an order of magnitude.

But as I wrote earlier, sometimes it's just a constraint to accept.

In the field of Open Source VoIP there are some well known applications able to perform audio transcoding, and I focus here on FreeSWITCH. FreeSWITCH can be configured to accept incoming calls with either Opus or G.711, and transcode the streams one into the other depending on the needs. For example, mobile app to FreeSWITCH will use Opus, FreeSWITCH to PSTN provider will use G.711. And the other way around.

Opus can work with sample rates up to 48000 Hz, which means 48000 audio samples each second, and a bitrate up to 510000 bps. The audio quality can be so good that Opus can be used to encode music. When dealing with VoIP though the key characteristics are not audio quality per se, but the compromise between available bandwidth, network conditions, and voice intelligibility.

When transcoding though from "high quality" (potentially Opus) to low quality (G.711), and viceversa, the advantages of higher sample rates are somehow lost. I have this analogy in mind, that works for my brain: it's like connecting two pipes (as in physical, plumbing pipes), with very different diameter. When water comes flowing from the narrow pipe, there's no advantage in making it flow through a wider pipe: the flow is limited upstream. Similarly, the wider pipe won't be able to transfer all the water to the narrower, and water leaks will appear. If you think this analogy is not quite right because what counts in plumbing pipes is also the speed of water, be kind and ignore the whole thing :-)

All this to say that it's possible to make bandwidth usage more efficient, without decreasing quality, by using Opus at 8000 samples/second, instead of the potential 48000. Furthermore, it's possible to limit the average bitrate, knowing that "quality can't be worse". Surely a compromise must be found, but the main principle is that since one side is "low quality" it's useless to try and "create quality" on the other side.

All this reasoning has been reflected in the work done inside the Libon project by Dragos. Recently we've tried to put down all this info in some sort of structured and (at least in our intentions) comprehensive document (FreeSWITCH and the Opus audio codec).

This document describes the usage of Opus inside FreeSWITCH under various points of view: configuration, installation, debugging, development. What we wanted to achieve was also the sharing of a common terminology to ease the info sharing and the discussions around this topic.

If you've made it to this point reading the article it means you're really interested in this topic: congratulations. Please take some time to read that document too and feel free to send over any feedback or question you may have, thank you. This is all just a learning process.

Thursday, 29 September 2016

Opus negotiation for the practical man

Opus [0] is a versatile audio codec, with a variable sample rate and bitrate, suitable for both music and speech. It is defined in RFC 6716 [1] and required by WebRTC [2].

Opus can operate at various sample rates, from 8 KHz to 48 KHz, and at variable bitrates, from 6 kbit/sec to 510 kbit/sec.

The RTP payload format defined for Opus in RFC 7587 [3] explains the use of media type parameters in SDP, and this article aims to analyze them and show in particular how "asymmetric streams" can be achieved.

This is an example of SDP defining an Opus offer or answer:

       m=audio 54312 RTP/AVP 101

       a=rtpmap:101 opus/48000/2

       a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000;

       maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0

       a=ptime:40

       a=maxptime:40 

Let's clarify one thing immediately, about rtpmap.

rtpmap

As specified in RFC 7587 Ch. 7, the media subtype portion of rtpmap must always be 'opus/48000/2' (48000 samples/sec, 2 channels), regardless of the actual sample rate used. So you can leave happily this configuration element out of your thoughts, even if you want to use a narrowband version of Opus.

e.g.:

a=rtpmap:96 opus/48000/2

Another less than intuitive aspect to clarify is how RTP timestamp are managed as the RTP represents audio with variable sample rates.

RTP timestamp

From RFC 7587, Ch. 4.1:

Opus supports 5 different audio bandwidths, which can be adjusted during a stream. The RTP timestamp is incremented with a 48000 Hz
clock rate for all modes of Opus and all sampling rates. The unit
for the timestamp is samples per single (mono) channel. The RTP
timestamp corresponds to the sample time of the first encoded sample
in the encoded frame. For data encoded with sampling rates other
than 48000 Hz, the sampling rate has to be adjusted to 48000 Hz.

This can be interpreted in this way: "The timestamp must always be set as if the sample rate is 48000 Hz."

Default case: the encoder is set at 48 KHz. A 20 msec frame contains 960 (48000 samples/sec * 20 msec) samples.

When the encoder is set at 8 KHz, instead, a 20 msec frame contains 160 (8000 samples/sec * 20 msec) samples. The timestamp in the RTP packet must be adapted, so that the sample rate is normalised to 48 KHz, by multiplying by 6 (48000/8000) the number of samples.

In both cases though a 20 msec frame will have an RTP representation with 960 "time clicks".

Now we start looking at the parameters that help the two parties in setting their encoders and decoders.

maxplaybackrate

From RFC 7587, Ch. 6.1:

maxplaybackrate: a hint about the maximum output sampling rate that
the receiver is capable of rendering in Hz. The decoder MUST be
capable of decoding any audio bandwidth, but, due to hardware
limitations, only signals up to the specified sampling rate can be
played back. Sending signals with higher audio bandwidth results
in higher than necessary network usage and encoding complexity, so
an encoder SHOULD NOT encode frequencies above the audio bandwidth
specified by maxplaybackrate. This parameter can take any value
between 8000 and 48000, although commonly the value will match one
of the Opus bandwidths (Table 1). By default, the receiver is
assumed to have no limitations, i.e., 48000.

This optional parameter is telling the encoder on the other side: "Since I won't be able to play at rates higher than `maxplaybackrate` you can save resources and bandwidth by limiting the encoding rate to this value."

A practical case is transcoding from Opus to G.711, where anyway the final playback rate will be 8000 Hz.

sprop-maxcapturerate

The specular (and still optional) parameter is sprop-maxcapturerate, defined in RFC 7587 Ch. 6.1:

sprop-maxcapturerate: a hint about the maximum input sampling rate
that the sender is likely to produce. This is not a guarantee
that the sender will never send any higher bandwidth (e.g., it
could send a prerecorded prompt that uses a higher bandwidth), but
it indicates to the receiver that frequencies above this maximum
can safely be discarded. This parameter is useful to avoid
wasting receiver resources by operating the audio processing
pipeline (e.g., echo cancellation) at a higher rate than
necessary. This parameter can take any value between 8000 and
48000, although commonly the value will match one of the Opus
bandwidths (Table 1). By default, the sender is assumed to have
no limitations, i.e., 48000.

This parameter is telling the decoder on the other side: "Since I won't be able to produce audio at rates higher than `sprop-maxcapturerate` you can save resources by limiting the decoding rate to this value."

A practical example is transcoding from G.711 to Opus, with the source always limited to a capture rate of 8000 samples/sec.

maxaveragebitrate

An additional element, maxaveragebitrate, refers to the maximum average bitrate that the decoder will be able to manage. This is a hint that it's not worth for the remote encoder to use higher bitrates, and that it can instead save resources.

From RFC 7587, Ch. 6.1:

maxaveragebitrate: specifies the maximum average receive bitrate of
a session in bits per second (bit/s). The actual value of the
bitrate can vary, as it is dependent on the characteristics of the
media in a packet. Note that the maximum average bitrate MAY be
modified dynamically during a session. Any positive integer is
allowed, but values outside the range 6000 to 510000 SHOULD be
ignored.

This parameter is telling the remote encoder: "Since my decoder can't handle bitrates higher than maxaveragebitrate, you can save computation power and bandwidth by limiting your encoder bitrate to this value."

A practical example could be a mobile client that wants to ensure the download bandwidth is not saturated. Note that this value refers only to the initial negotiation (SDP offer/answer), while the parties can negotiate different values during an active call.

Asymmetric negotiation

Given the interpretations above, it seems also possible to negotiate asymmetrical streams: the two entities involved can encode and decode at different rates when appropriate.

In particular, if we imagine an entity with local parameters:

maxplaybackrate=Da; sprop-maxcapturerate=Ea; maxaveragebitrate=Fa

and remote parameters:

maxplaybackrate=Db; sprop-maxcapturerate=Eb; maxaveragebitrate=Fb

then this entity can set the decoder at a sample rate of min(Da, Eb) and the encoder at a sample rate of min(Ea, Db) and bitrate at Fb.

Similarly and intuitively, the other entity involved can set the decoder at a sample rate of min(Db, Ea) and the encoder at a sample rate of min(Eb, Da) and bitrate Fa.

All these values are optional, as mentioned above, so there are various permutations possible here. In particular when maxaveragebitrate is not provided, then it's assumed to be the maximum (510000 bps).

I hope this can clarify some subtleties, or at least open a table for discussion and eventually lead to a better understanding of the topic.

References

[0] https://www.opus-codec.org/

[1] http://tools.ietf.org/html/rfc6716

[2] https://tools.ietf.org/html/draft-ietf-rtcweb-audio-10

[3] https://tools.ietf.org/html/rfc7587

Saturday, 17 September 2016

Deploying Homer with Puppet

Fan of Homer? So am I, and as sometimes happens I'm a fan who could join the team!

If despite the title of this post you're still reading, then it's a good sign and we can move on.

Homer is a vast project that aims to provide a tool, with a GUI, to correlate all the signalling, RTCP stats, events, logs in your RTC network. It focuses heavily on SIP, for historical reasons, but it's also an extendible framework to store other types of signalling, correlate data, and compute statistics. People browsing their github account are often heard saying "Do they have this too? And this? Wow!".

It is compatible off the shelf with common applications like Kamailio, opensips, FreeSWITCH, Asterisk, so if you're into VoIP, adding Homer to your platform is as easy as installing it and telling your apps where to send their data. There are also standalone tools like captagent, nodejs apps to parse and collect specific logs, to be associated with the related signalling, and a plethora of libraries, including a C one.

Anyway the topic is extremely vast and you can find a lot (a lot) of information on the sipcapture website.

Lately I've been working on Homer deployments using Puppet, a Configuration Management tool, so I wanted to share the experience, and as a result you can find a Puppet module in the homer-puppet repo. In fact this is re-written from scratch from previous experiences and focusing on debian/Ubuntu. Specific need on other distributions can be addressed without much effort, so anybody deploying their infrastructure with Puppet and using Homer is encouraged to look at this work and provide feedback and questions.

Homer can be installed with a well tested homer-installer and through Docker containers, so this work just adds to the deployment opportunities, but as usual in this field, what fits for an organisation may not fit for another.

The approach is quite flexible. Most of the data has a default value so the minimum amount of data to be passed to the module - which of course can be done via hiera - is very limited and aims to allow people to configure a new system in minutes.

Homer has 4 main components: the DB of course, kamailio or opensips to collect data from the apps, a web server for the GUI (homer-ui) and an API for the queries (homer-api). With homer-puppet you can git checkout the versions you need for homer-ui and homer-api and just launch puppet apply (standalone mode) to have everything installed and configured.

There is a default kamailio.cfg for storing data and providing stats, but that can be customised to your needs (see the modules/homer/files/kamailio folder inside the Puppet module).

Templates are used for the files containing variable elements (namely, mysql and admin credentials, and a few more).

I'm working on a version that instead of installing the components directly on the target host is designed to manage Docker containers (one for kamailio, one for the web part), through Docker Compose. There are many moving parts and while it fits well in a system that already includes a private Docker registry, it's trickier to "sanitise" and share. But I'm getting there.

Meanwhile, enjoy!

Tuesday, 7 June 2016

FreeSWITCH - Check what configuration directories are in use

There is a little trick to see what directories FreeSWITCH is using as paths for the configuration files:

/opt/freeswitch/bin/fs_cli -x 'global_getvar'| grep _dir

For example, the output can be:

base_dir=/usr/local/freeswitch
recordings_dir=/usr/local/freeswitch/recordings
sounds_dir=/usr/local/freeswitch/sounds
conf_dir=/opt/freeswitch/etc/freeswitch/
log_dir=/usr/local/freeswitch/log
run_dir=/usr/local/freeswitch/log
db_dir=/usr/local/freeswitch/db
mod_dir=/usr/local/freeswitch/mod
htdocs_dir=/usr/local/freeswitch/htdocs
script_dir=/usr/local/freeswitch/scripts
temp_dir=/tmp
grammar_dir=/usr/local/freeswitch/grammar
fonts_dir=/usr/local/freeswitch/fonts
images_dir=/usr/local/freeswitch/images
certs_dir=/usr/local/freeswitch/certs
storage_dir=/usr/local/freeswitch/storage
cache_dir=/usr/local/freeswitch/cache
data_dir=/usr/local/freeswitch
localstate_dir=/usr/local/freeswitch
internal_ssl_dir=/usr/local/freeswitch/conf/ssl
external_ssl_dir=/usr/local/freeswitch/conf/ssl

This is handy in particular when you're testing an installation from source but the configuration is not in the default location.

It's possible to set non-default values by passing them as arguments for the daemon, e.g.:

/usr/local/freeswitch/bin/freeswitch -conf /opt/freeswitch/etc/freeswitch/ -log /usr/local/freeswitch/log -db /usr/local/freeswitch/db -ncwait -core

In general, when in doubt about the configuration path, use that fs_cli command to verify.

More info is as usual available from FreeSWITCH official documentation: Command Line Switches.

Friday, 27 May 2016

Continuous Integration and Kamailio

I've presented a workshop at Kamailio World 2016. It focused on tools to help automating the build, deployment and test of Kamailio-based applications using Jenkins, Docker and a few other technologies.

It's been also an opportunity to show a sample usage of the new http_async_client module, designed to perform non-blocking HTTP queries from Kamailio.

The interested reader can find the slides here:

Continuous Integration and Kamailio from Giacomo Vacca

And if you have an hour to spare, here's the full video:

Any feedback or question you may have, please get in touch. I have a post on the event in progress, but there are so many things to highlight that it will require some more time.

Many thanks to Daniel and Elena-Ramona (more info here), event hosts, and Pascom.net for video streaming, recording and editing.

Wednesday, 3 February 2016

Extracting Opus from a pcap file into an audible wav

From time to time I need to verify that the audio inside a trace is as expected. Not much in terms of quality, but more often content and duration.

A few years ago I wrote a small program to transform a pcap into a wav file - the codec in use was SILK.

These days I'm dealing with Opus, and I have to say things are greatly simplified, in particular if you consider opus-tools, a set of utilities to handle opus files and traces.

One of those tools, opusrtp, can do live captures and write the interpreted payload into a .opus file.
Still, what I needed was to achieve the same result but from a pcap already existing, i.e. "offline".

So I come up with a small - quite shamlessly copy&pasted - patch to opusrtc, which is now in this fork.
Once you have a pcap with an RTP stream with opus (say in input.pcap) you can retrieve the .opus equivalent (in rtpdump.opus) with:

./opusrtp --extract input.pcap

Then you can generate an audible wav file with:

./opusdec --rate 8000 rtpdump.opus output.wav

Happy decoding.

Friday, 18 December 2015

TADHack mini Paris

"I’ve been following TADHack and its related events for some time, and finally this month I got the opportunity to attend TADHack-mini Paris. Participants can join from remote too, but the personal full immersion is something different (even, ironically, when the topic is Real Time Communications, and more in particular WebRTC and Telecom APIs.

We met in central Paris [...] "

This is the beginning of the behind-the-scenes story about my TADHack participation. You can read my full article here.