There are various levels of testing, from unit to component, from integration to end-to-end, not to mention performance testing and fuzzing.
When developing or maintaining Real Time Communications (RTC or VoIP) systems, all these levels (with the exclusion maybe of unit testing) are made easier by applications explicitly designed for this, like
sipp.
sipp has a deep focus on performance testing, or using a simpler term, load testing. Some of its features allow to fine tune properties like call rate, call duration, simulate packet loss, ramp up traffic, etc. In practical terms though once you have the flexibility to generate SIP signalling to negotiate sessions and RTP streams, you can use sipp for functional testing too.
sipp can act as an entity generating a call, or receiving a call, which makes it suitable to surround the system under test and simulate its interactions with the real world.
What sipp does can be generalised: we want to be able to simulate the real world that surrounds (or will surround) our system in Production. From this point of view sipp is not the only answer, and projects often use other tools, or a combination of other tools.
One simple and effective approach is re-using RTC applications and build the testing tool around them. When a system is built around an application, it's likely that the people working on it are familiar enough with the application to re-use it to mock the external world. This is often achieved with
Asterisk or
FreeSWITCH. They both expose an API for originating calls, and surely can play the role of called party (or "absorbers", or "parrots" depending on their main scope and terminology).
Kamailio also can be used to generate calls, even though its core focus on signalling makes it slightly more complex to use in generic cases.
Unless behavioural changes are put in place, such solutions imply compromising on the SIP stacks in use. Asterisk or FreeSWITCH won't make it too easy to generate an INVITE with a wrongly formatted SIP header, for example, while sipp is much more flexible, ad the SIP messages can be mocked down to the single character. What typically happens is that sipp is used to generate or receive calls when specific syntax requirements for the signalling are needed, while Asterisk and FreeSWITCH can be used in more permissive cases, where what's important is a generic session establishment.
When dealing with media (typically, RTP streams) is necessary, then
sipp provides at least two methods: re-playing an RTP stream from a trace (pcap file), or encoding a WAV file into a stream. Recently
sipp added the ability to play RTP Events separately (DTMF tones as in RFC 2833 - I think the first patch with this functionality was
this). sipp is not able to transcode or generate non-PCM streams, but still it can play a non-PCM stream with just some limitations, which covers most of typical cases.
Less generic scenarios where RTC applications like Asterisk and FreeSWITCH can be useful are the ones requiring SRTP (encrypted RTP). Even though sipp can be used to negotiate SRTP, by adapting the SDP portion of the offer/answer, it doesn't provide a solution to generate SRTP streams.
In this case a very useful item to add to your toolbox is
pjsip, which is a SIP stack library (used also by
Asterisk and
chan_pjsip being the current recommended SIP channel, as opposed to the older chan_sip) that exposes an API and also a command-line option (
pjsua).
pjsua can be used directly, with either command line arguments or a configuration file, or it's possible to use
pjsip library to write programs with languages like python: this makes it very flexible and helps its integration in existing and new testing systems.
With pjsip, it's possible to generate calls that play audio and DTMF tones, in a similar way than sipp, but also encrypt RTP and establish SRTP streams.
pjsua
The easiest approach is to build the pjsip project and use the pjsua binary (you can see a procedure in the Appendix).
pjsua accepts command-line arguments, but can receive arguments from a configuration file, which makes it easier to read. For example you could just
# pjsua --config-file pjsua.cfg
where pjsua.cfg contains just the caller and callee:
sip:bob@example.com
--id=sip:alice@example.com
A more sophisticated configuration file contains instructions on codecs and encryption, e.g.
sip:bob@example.com
--id=sip:alice@example.com
--use-srtp=0
--srtp-secure=0
--realm=*
--log-level=6
--no-vad
--dis-codec GSM
--dis-codec H263
--dis-codec iLBC
--dis-codec G722
--dis-codec speex
--dis-codec pcmu
--dis-codec pcma
--dis-codec opus
--add-codec pcma
--null-audio
--auto-play
--play-file /some_audio.wav
Since I mentioned SRTP as a possible key element for using pjsip, let's look into the related options:
--use-srtp=0
--srtp-secure=0
'use-srtp' can be 0, 1 or 2, and means "disabled", "optional" and "mandatory", respectively.
With "optional" pjsua offers both plain and encrypted RTP at the same time, and the callee entity can decide. With "mandatory" it will only offer SRTP, and the callee will have to either accept or reject.
'srtp-secure' refers to the use of TLS, and can also be 0, 1 or 2, meaning "not required", use "tls", or use "sips" respectively. Needless to say, in normal scenarios you want to protect the SRTP crypto information carried in the SDP, so you want to encrypt signalling too. SIP over TLS is the typical solution. For testing purposes you may prefer making it easier to check the content of signalling, and use 'srtp-secure=0'.
'no-vad' formally should be used to disable silence detection; in practice you want this option when generating a call from a machine that doesn't have a sound card.
Similarly, 'null-audio' disables the requirement to play the audio, required when the calls are generated from a host with no sound interfaces.
'dis-codec' is used to disable a codec from the negotiation, and 'add-codec' instead selects a codec to be added to the offer. This adds flexibility, and it's also worth noting that video codecs are available too.
Using pjsip library with python
It's possible to use the pjsip library's API with high level programming languages like python. This makes test automation quite versatile, and I remember seeing this approach as early as 2012, where the project I was working on had the client applications built on top of pjsip: it was extremely valuable to simulate programmatically the clients from linux machines.
Being designed for interactive applications, pjsip comes with a nice event-based model, so in principle you need to trigger the desired actions and register callback functions that will be called at the proper moment.
A complete reference to the python library can be found
here.
In general, after you import the library:
then the library is imported in an object, the configuration objects are populated, and a call is triggered, e.g.:
lib = pj.Lib()
media_cfg = pj.MediaConfig()
media_cfg.no_vad = 0
lib.init(log_cfg = pj.LogConfig(level=3, callback=log_cb), media_cfg=media_cfg)
lib.set_null_snd_dev()
lib.set_codec_priority("GSM", 0)
lib.set_codec_priority("iLBC", 0)
lib.set_codec_priority("G722", 0)
lib.set_codec_priority("speex", 0)
lib.set_codec_priority("pcmu", 0)
lib.set_codec_priority("pcma", 1)
transport = lib.create_transport(pj.TransportType.UDP)
lib.start()
acc = lib.create_account_for_transport(transport)
call = acc.make_call(sys.argv[1], MyCallCallback(), hdr_list=custom_headers)
You can see that set_codec_priority to 0 is equivalent to the --dis-codec command line option.
MyCallCallback() is the callback function that will be invoked at each change of call state, with an event object passed as argument. You'll have something like:
class MyCallCallback(pj.CallCallback):
def __init__(self, call=None):
pj.CallCallback.__init__(self, call)
def on_state(self):
...
if self.call.info().state == pj.CallState.CONFIRMED:
# The call has been answered
# Here you can create a player to generate audio into an RTP stream, send DTMF, log information, etc
# You can even invoke other APIs to interact with more complex systems
...
def on_media_state(self):
global lib
if self.call.info().media_state == pj.MediaState.ACTIVE:
...
# Media is now flowing, so you can connect it to the internal conference object
# Connect the call to sound device
call_slot = self.call.info().conf_slot
lib.conf_connect(call_slot, 0)
lib.conf_connect(0, call_slot)
print "on_media_state - MediaState ACTIVE"
As it can be expected, exceptions can be caught and errors displayed:
except pj.Error, e:
print "Exception: " + str(e)
lib.destroy()
lib = None
sys.exit(1)
If you happen to need DTMF tones, pjsip offers the dial_dtmf() function, as part of the Call object, e.g.:
Just remember that these calls are asynchronous, non-blocking: you need to add explicitly a delay to separate the beginning of a tone from other actions.
pjsip will generate proper RTP Event packets of the given duration, inside the existing RTP stream (and so they will have the same SSRC and proper timestamp reference).
I'll write about analysing pcap traces to extract information on RTP events in a separate article.
Wrap up
This article is somehow what I would have wanted to read on the topic some time ago, but I had to infer from various sources and after various experiments. I hope it will be useful to some of the readers.
Appendix - pjsua build and install
To build pjsua on debian you can do something like:
apt install python-dev gcc make gcc binutils build-essential libasound2-dev wget
wget https://github.com/pjsip/pjproject/archive/2.10.tar.gz
tar -xvf 2.10.tar.gz
cd pjproject-2.10
export CFLAGS="$CFLAGS -fPIC"
./configure && make dep && make
The binary will be available at ./pjsip-apps/bin/pjsua-x86_64-unknown-linux-gnu, which of course you can link to something easier to use, or copy to a directory in the PATH.