G.729 is an 8 Kpbs audio codec, standardized by ITU, and called also CS-ACELP: Coding of Speech using Coniugate-Structure Algebraic-Code-Excited Linear Prediction, as that’s the algorithm used for audio compression.
It has two extended versions: G.729A (optimized algorithm, slightly lower quality) and G.729B (extended features, higher quality), and it's popular in the VoIP world because combines very low bitrate with good quality (but alas is not royalty-free).
G.729A takes as input frames of voice of 10 msec of duration, sampled at 8KHz and with each sample having 16 bit:
This gives a frame size of 80 samples:
8000 sample/sec * 10 * 10e-3 sec/frame = 80 samples/frame
The output is 8 Kbps, so each encoded frame is represented by 10 Bytes:
8000 bit/sec * 10 * 10e-3 sec/frame = 80 bit/frame = 10 Bytes/frame
Considering its bit rate, G.729 has an excellent perceived quality (MOS).
Under normal network conditions G.729A has MOS 4.04 (while G.711 u-law, 64 kbps, has 4.45)
Under stressed network conditions G.729A has MOS 3.51 (while G.711 u-law, 64 kbps, has 4.13)
Perfect quality has MOS 5.
G.729 doesn’t support (reliably) DTMF (RFC 2833)
Algorithm delay and complexity
The delay between input and encoded output is 15 msec: 1 frame (10 msec) + 5 msec required by the look-ahead prediction algorithm.
Not surprising, such a low bitrate associated with high quality, G.729 has relatively high complexity, 15 (while G.711 has 1, and on the other extreme side, G.723.1 has 25).
VAD and CNG
G.729B has been extended with VAD (Voice Activity Detection, which causes silence suppression), and generates CNG (Comfort Noise Generation) packets. This helps the receiving end in two key elements:
1. Recover synchronization in condition of high latency network
2. Generate Comfort Noise (which in case of silence from the transmitting end, tells the receiver that the call is still up)
Next to come, a gentle description of ACELP and G.723.