Thursday, 29 September 2016

Opus negotiation for the practical man

Opus [0] is a versatile audio codec, with a variable sample rate and bitrate, suitable for both music and speech. It is defined in RFC 6716 [1] and required by WebRTC [2].

Opus can operate at various sample rates, from 8 KHz to 48 KHz, and at variable bitrates, from 6 kbit/sec to 510 kbit/sec.

The RTP payload format defined for Opus in RFC 7587 [3] explains the use of media type parameters in SDP, and this article aims to analyze them and show in particular how "asymmetric streams" can be achieved.

This is an example of SDP defining an Opus offer or answer:

       m=audio 54312 RTP/AVP 101
       a=rtpmap:101 opus/48000/2
       a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000;
       maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0
       a=ptime:40
       a=maxptime:40 


Let's clarify one thing immediately, about rtpmap.

rtpmap


As specified in RFC 7587 Ch. 7, the media subtype portion of rtpmap must always be 'opus/48000/2' (48000 samples/sec, 2 channels), regardless of the actual sample rate used. So you can leave happily this configuration element out of your thoughts, even if you want to use a narrowband version of Opus.

e.g.:

a=rtpmap:96 opus/48000/2

Another less than intuitive aspect to clarify is how RTP timestamp are managed as the RTP represents audio with variable sample rates.

RTP timestamp


From RFC 7587, Ch. 4.1:

   Opus supports 5 different audio bandwidths, which can be adjusted   during a stream.  The RTP timestamp is incremented with a 48000 Hz
   clock rate for all modes of Opus and all sampling rates.  The unit
   for the timestamp is samples per single (mono) channel.  The RTP
   timestamp corresponds to the sample time of the first encoded sample
   in the encoded frame.  For data encoded with sampling rates other
   than 48000 Hz, the sampling rate has to be adjusted to 48000 Hz.

This can be interpreted in this way: "The timestamp must always be set as if the sample rate is 48000 Hz."

Default case: the encoder is set at 48 KHz. A 20 msec frame contains 960 (48000 samples/sec * 20 msec) samples.
When the encoder is set at 8 KHz, instead, a 20 msec frame contains 160 (8000 samples/sec * 20 msec) samples. The timestamp in the RTP packet must be adapted, so that the sample rate is normalised to 48 KHz, by multiplying by 6 (48000/8000) the number of samples.

In both cases though a 20 msec frame will have an RTP representation with 960 "time clicks".

Now we start looking at the parameters that help the two parties in setting their encoders and decoders.

maxplaybackrate


From RFC 7587, Ch. 6.1:

     maxplaybackrate:  a hint about the maximum output sampling rate that
      the receiver is capable of rendering in Hz.  The decoder MUST be
      capable of decoding any audio bandwidth, but, due to hardware
      limitations, only signals up to the specified sampling rate can be
      played back.  Sending signals with higher audio bandwidth results
      in higher than necessary network usage and encoding complexity, so
      an encoder SHOULD NOT encode frequencies above the audio bandwidth
      specified by maxplaybackrate.  This parameter can take any value
      between 8000 and 48000, although commonly the value will match one
      of the Opus bandwidths (Table 1).  By default, the receiver is
      assumed to have no limitations, i.e., 48000.

This optional parameter is telling the encoder on the other side: "Since I won't be able to play at rates higher than `maxplaybackrate` you can save resources and bandwidth by limiting the encoding rate to this value."

A practical case is transcoding from Opus to G.711, where anyway the final playback rate will be 8000 Hz.

sprop-maxcapturerate


The specular (and still optional) parameter is sprop-maxcapturerate, defined in RFC 7587 Ch. 6.1:

     sprop-maxcapturerate:  a hint about the maximum input sampling rate
      that the sender is likely to produce.  This is not a guarantee
      that the sender will never send any higher bandwidth (e.g., it
      could send a prerecorded prompt that uses a higher bandwidth), but
      it indicates to the receiver that frequencies above this maximum
      can safely be discarded.  This parameter is useful to avoid
      wasting receiver resources by operating the audio processing
      pipeline (e.g., echo cancellation) at a higher rate than
      necessary.  This parameter can take any value between 8000 and
      48000, although commonly the value will match one of the Opus
      bandwidths (Table 1).  By default, the sender is assumed to have
      no limitations, i.e., 48000.

This parameter is telling the decoder on the other side: "Since I won't be able to produce audio at rates higher than `sprop-maxcapturerate` you can save resources by limiting the decoding rate to this value."

A practical example is transcoding from G.711 to Opus, with the source always limited to a capture rate of 8000 samples/sec.

maxaveragebitrate


An additional element, maxaveragebitrate, refers to the maximum average bitrate that the decoder will be able to manage. This is a hint that it's not worth for the remote encoder to use higher bitrates, and that it can instead save resources.

From RFC 7587, Ch. 6.1:

     maxaveragebitrate:  specifies the maximum average receive bitrate of
      a session in bits per second (bit/s).  The actual value of the
      bitrate can vary, as it is dependent on the characteristics of the
      media in a packet.  Note that the maximum average bitrate MAY be
      modified dynamically during a session.  Any positive integer is
      allowed, but values outside the range 6000 to 510000 SHOULD be
      ignored.

This parameter is telling the remote encoder: "Since my decoder can't handle bitrates higher than maxaveragebitrate, you can save computation power and bandwidth by limiting your encoder bitrate to this value."

A practical example could be a mobile client that wants to ensure the download bandwidth is not saturated. Note that this value refers only to the initial negotiation (SDP offer/answer), while the parties can negotiate different values during an active call.

Asymmetric negotiation


Given the interpretations above, it seems also possible to negotiate asymmetrical streams: the two entities involved can encode and decode at different rates when appropriate.

In particular, if we imagine an entity with local parameters:

maxplaybackrate=Da; sprop-maxcapturerate=Ea; maxaveragebitrate=Fa

and remote parameters:

maxplaybackrate=Db; sprop-maxcapturerate=Eb; maxaveragebitrate=Fb

then this entity can set the decoder at a sample rate of min(Da, Eb) and the encoder at a sample rate of min(Ea, Db) and bitrate at Fb.

Similarly and intuitively, the other entity involved can set the decoder at a sample rate of min(Db, Ea) and the encoder at a sample rate of min(Eb, Da) and bitrate Fa.

All these values are optional, as mentioned above, so there are various permutations possible here. In particular when maxaveragebitrate is not provided, then it's assumed to be the maximum (510000 bps).

I hope this can clarify some subtleties, or at least open a table for discussion and eventually lead to a better understanding of the topic.

References


[0] https://www.opus-codec.org/
[1] http://tools.ietf.org/html/rfc6716
[2] https://tools.ietf.org/html/draft-ietf-rtcweb-audio-10
[3] https://tools.ietf.org/html/rfc7587
 

Saturday, 17 September 2016

Deploying Homer with Puppet

Fan of Homer? So am I, and as sometimes happens I'm a fan who could join the team!

If despite the title of this post you're still reading, then it's a good sign and we can move on.

Homer is a vast project that aims to provide a tool, with a GUI, to correlate all the signalling, RTCP stats, events, logs in your RTC network. It focuses heavily on SIP, for historical reasons, but it's also an extendible framework to store other types of signalling, correlate data, and compute statistics. People browsing their github account are often heard saying "Do they have this too? And this? Wow!".

It is compatible off the shelf with common applications like Kamailio, opensips, FreeSWITCH, Asterisk, so if you're into VoIP, adding Homer to your platform is as easy as installing it and telling your apps where to send their data. There are also standalone tools like captagentnodejs apps to parse and collect specific logs, to be associated with the related signalling, and a plethora of libraries, including a C one.

Anyway the topic is extremely vast and you can find a lot (a lot) of information on the sipcapture website.

Lately I've been working on Homer deployments using Puppet, a Configuration Management tool, so I wanted to share the experience, and as a result you can find a Puppet module in the homer-puppet repo. In fact this is re-written from scratch from previous experiences and focusing on debian/Ubuntu. Specific need on other distributions can be addressed without much effort, so anybody deploying their infrastructure with Puppet and using Homer is encouraged to look at this work and provide feedback and questions.

Homer can be installed with a well tested homer-installer and through Docker containers, so this work just adds to the deployment opportunities, but as usual in this field, what fits for an organisation may not fit for another.

The approach is quite flexible. Most of the data has a default value so the minimum amount of data to be passed to the module - which of course can be done via hiera - is very limited and aims to allow people to configure a new system in minutes.

Homer has 4 main components: the DB of course, kamailio or opensips to collect data from the apps, a web server for the GUI  (homer-ui) and an API for the queries (homer-api). With homer-puppet you can git checkout the versions you need for homer-ui and homer-api and just launch puppet apply (standalone mode) to have everything installed and configured.

There is a default kamailio.cfg for storing data and providing stats, but that can be customised to your needs (see the modules/homer/files/kamailio folder inside the Puppet module).

Templates are used for the files containing variable elements (namely, mysql and admin credentials, and a few more).


I'm working on a version that instead of installing the components directly on the target host is designed to manage Docker containers (one for kamailio, one for the web part), through Docker Compose. There are many moving parts and while it fits well in a system that already includes a private Docker registry, it's trickier to "sanitise" and share. But I'm getting there.

Meanwhile, enjoy!

Wireshark setting to interpret UDP as RTP automatically

 Before I forget again, a Wireshark setting that can help saving time by trying to interpret any UDP as RTP, if possible: Analyze --> Ena...