Skip to main content

A gentle introduction to G.729

G.729 is an 8 Kpbs audio codec, standardized by ITU, and called also CS-ACELP: Coding of Speech using Coniugate-Structure Algebraic-Code-Excited Linear Prediction, as that’s the algorithm used for audio compression.

It has two extended versions: G.729A (optimized algorithm, slightly lower quality) and G.729B (extended features, higher quality), and it's popular in the VoIP world because combines very low bitrate with good quality (but alas is not royalty-free).

G.729A takes as input frames of voice of 10 msec of duration, sampled at 8KHz and with each sample having 16 bit:

This gives a frame size of 80 samples:
8000 sample/sec * 10 * 10e-3 sec/frame = 80 samples/frame

The output is 8 Kbps, so each encoded frame is represented by 10 Bytes:
8000 bit/sec * 10 * 10e-3 sec/frame = 80 bit/frame = 10 Bytes/frame

Quality
Considering its bit rate, G.729 has an excellent perceived quality (MOS).
Under normal network conditions G.729A has MOS 4.04 (while G.711 u-law, 64 kbps, has 4.45)
Under stressed network conditions G.729A has MOS 3.51 (while G.711 u-law, 64 kbps, has 4.13)
Perfect quality has MOS 5.

G.729 doesn’t support (reliably) DTMF (RFC 2833)

Algorithm delay and complexity
The delay between input and encoded output is 15 msec: 1 frame (10 msec) + 5 msec required by the look-ahead prediction algorithm.

Not surprising, such a low bitrate associated with high quality, G.729 has relatively high complexity, 15 (while G.711 has 1, and on the other extreme side, G.723.1 has 25).

VAD and CNG
G.729B has been extended with VAD (Voice Activity Detection, which causes silence suppression), and generates CNG (Comfort Noise Generation) packets. This helps the receiving end in two key elements:
1. Recover synchronization in condition of high latency network
2. Generate Comfort Noise (which in case of silence from the transmitting end, tells the receiver that the call is still up)

Next to come, a gentle description of ACELP and G.723.

Popular posts from this blog

Troubleshooting TURN

  WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. I t dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data. The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.

VoIP calls encoded with SILK: from RTP to WAV

SILK is a codec defined by Skype, but can be found in many VoIP clients, like CSipSimple . It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz). Since Wireshark doesn't allow you to decode an RTP stream carrying SILK frames, I was curious to find a programmatic way to do it. In fact, this has also allowed to me to earn a "tumbleweed" badge in stackoverflow . You may argue that a Wireshark plugin would be the right solution, but that's probably for another day. Initially I thought it was sufficient to read the specification for RTP payload when using SILK ; the truth is that I had to reverse engineer a solution by looking at SILK SDK's test vectors. There, I discovered that a file containing SILK audio doesn't have the file header indicated in the IETF draft ("!#SILK"), but a slightly different one ("!#SILK_V3"). More importantly, each encoded frame is not preced...

Extracting Opus from a pcap file into an audible wav

From time to time I need to verify that the audio inside a trace is as expected. Not much in terms of quality, but more often content and duration. A few years ago I wrote a small program to transform a pcap into a wav file - the codec in use was SILK. These days I'm dealing with Opus , and I have to say things are greatly simplified, in particular if you consider opus-tools , a set of utilities to handle opus files and traces. One of those tools, opusrtp , can do live captures and write the interpreted payload into a .opus file. Still, what I needed was to achieve the same result but from a pcap already existing, i.e. "offline". So I come up with a small - quite shamlessly copy&pasted - patch to opusrtc, which is now in this fork . Once you have a pcap with an RTP stream with opus (say in input.pcap ) you can retrieve the .opus equivalent (in rtpdump.opus ) with: ./opusrtp --extract input.pcap Then you can generate an audible wav file with: ./opusd...