Thursday 27 October 2022

About ICE negotiation

Disclaimer: I wrote this article on March 2022 while working with Subspace, and the original link is here: https://subspace.com/resources/ice-negotiation . This post in my personal blog is a way to ensure it doesn't get lost. There is nothing service-specific in it, I've made only minor edits and I hope it can be a good technical reference on the topic.



WebRTC is a set of protocols that allow applications, typically running on Web browsers, to exchange media (audio, video, data) with other entities.

Before media can flow, however, the WebRTC entities need to discover what type of connection is possible, and among the possible connections, what’s the best to be used. This needs to happen as fast as possible, so that users can perceive the service as instantaneous as possible.

WebRTC includes protocols like STUN and TURN that are designed to facilitate the establishment of connections when a direct connection is not possible. The typical case is a computer inside a home or office network, with a private IP address, and able to reach the public Internet only through an address translation (NAT).

STUN helps in discovering the IP address and port from where a computer enters the Internet, and in some circumstances that IP address and port can be used by other entities to reach that computer. STUN is also used for keeping such bindings alive.

TURN provides a way for two entities to communicate when they are behind two different symmetric NATs, or when one is behind a firewall that restricts outbound traffic to only some UDP or TCP ports. TURN uses STUN as the underlying protocol, adding requests, responses and indications to accomplish media relay.

STUN and TURN play a role in the ICE negotiation process. ICE, Interactive Connectivity Establishment, is a protocol that allows the dynamic discovery of the best way to establish a connection for entities that may be behind NAT.

All WebRTC clients use ICE before media can flow.

There are three main phases: the gathering of candidates, the connectivity checks, and the nomination of the candidate pairs to be used.

The ICE candidates are simply transport addresses (IP address, port and transport type) that can potentially be used to communicate (send and receive media) and that the ICE client collects and shares to the other party through some form of signalling.

There are three main types of ICE candidates: 'host', 'server reflexive' and 'relay'.

'host' candidates refer to transport addresses that are directly visible by the client, where the client can start listening for incoming connections or packets. Computers behind NAT may only have private IP addresses as 'host' candidates, but they are potentially usable if the other party belongs to the same network, or depending on the type of NAT.

'server reflexive' candidates are the ones discovered through the interaction with a STUN server. The client sends a Binding Request to the STUN server and receives a Binding Success Response with a MAPPED-ADDRESS containing the source IP address and port from where the request was received. There may be more than one level of NAT, and that transport address represents the outmost one.

'relay' candidates refer to allocations reserved on a TURN server. The ICE client requests an Allocation of a relay, and after successful authentication the TURN server provides a RELAYED-ADDRESS containing the transport address allocated on that server for the client.

These types of ICE candidates have different priorities, 'host' being at highest priority and 'relay' at lowest priority; this is a way to privilege direct interconnection when possible (but that not necessarily represents the best solution in terms of connection quality and in general of Quality Of Experience).

This diagrams shows a typical process where ICE candidates are gathered and sent to the other party:


In this example the candidates are communicated as soon as they are retrieved. This technique is called Trickle ICE and was designed to ensure that the connectivity checks can happen as soon as the candidates are available.

Without Trickle ICE, the WebRTC client would need to wait for all the candidates to be collected before sending an offer or an answer, increasing the session set up time.

The candidates are transmitted over a signalling system established between the two parties. This is outside of the WebRTC specifications and application-specific.

The WebRTC client will then receive the ICE candidates from the other party, and it will build a list of “candidate pairs”: each local candidate will be paired with each remote candidate.

After this operation the connectivity checks can begin.

Let’s see it with an example, assuming UDP as transport for all cases. The WebRTC client has a local host candidate, IP1:port1, and has received a remote host candidate, IP2:port2. It builds a “candidate pair” with the two: {IP1:port1, IP2:port2}.

The WebRTC client will start sending STUN Binding Requests with source IP1:port1 and destination IP2:port2. These requests use STUN short-term authentication, and contain a username and password that were previously exchanged when transmitting the candidates inside the SDP offer/answer.

If the Binding Request reaches the other party on IP2:port2, then the other party will authenticate the request, and respond with a Binding Success Response, including the MAPPED-ADDRESS attribute, containing the source of the request.

If the Binding Success Response reaches the WebRTC client, then it will identify the request by looking into the transaction ID and mark the check as Successful, and so suitable for exchanging media.

If for any reason the Binding Success Response is not received, then the candidate pair will remain in a In Progress state for some time, and then move to Failed: that pair cannot be used to exchange media.



It’s important to note that this is symmetrical: the other party too can start the connectivity check from IP2:port2 towards IP1:port1. To avoid a conflict, the parties assume the role of "controlling" or "controlled" agent. The controlling agent will be the one deciding which candidate pair will be used for exchanging media.

Before media can flow through a TURN server, a client must create a Permission. This is important during ICE connectivity checks.

For ICE candidates of type 'relay', the connectivity check will be performed sending Binding Requests that traverse the TURN server and reach the other party on the relay side. The Binding Requests will be carried by a Send Indication, destined to the remote candidate as peer address. The TURN server will only accept it and relay it to the destination if a Permission has been granted for that peer.



(NAT has been omitted in this diagram for simplicity)

Of course a TURN allocation must exist for the CreatePermission request to succeed, but that’s already been created during the candidate gathering phase.

If the candidate pair selected for exchanging media will be one with a local 'relay' candidate, then typically the WebRTC client binds a TURN Channel to the other party, and starts exchanging media using ChannelData messages, instead of Send/Data Indications.

There are more details that can be discussed, like managing timeouts, role conflicts, ICE Lite, etc, which we will address in other articles. One additional aspect is important here: we mentioned three types of candidates, host, server reflexive and relayed, but there’s a fourth one, "peer reflexive".

Peer reflexive candidates are not provided directly by a party during candidate exchange, but are instead discovered dynamically during the connectivity checks. Getting back to the previous example with a candidate pair {IP1:port1, IP2:port2}, depending on NAT conditions, the response to the Bind Request from IP1:port1 can be received by another source, IP3:port3.

The WebRTC client can verify IP3:port3 is sending a valid response by checking the transaction ID, and if successful it will dynamically add a remote candidate of type peer reflexive. Sometimes the peer reflexive candidate is the only one suitable and will be used to exchange media.

Chrome’s chrome://webrtc-internals, Firefox’s about:webrtc and Safari's "WebRTC Logging" will show the list of candidates and the pair that was selected, so those tools are of great value when troubleshooting.

If you take a trace on the computer running the WebRTC client, you’ll be able to see the STUN Binding Requests and Responses, and CreatePermission, Send/Data Indications for connectivity checks if unencrypted TURN is used. Wireshark will filter those messages for you if you use the ‘stun’ filter, and will also be able to interpret the Binding Request/Response carried inside Send/Data Indications (and also the RTP streams, but that’s for another article).

If you're interested about troubleshooting TURN sessions, take a look at this other article, "Troubleshooting TURN".

Monday 16 May 2022

Troubleshooting TURN

 

WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. It dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data.


The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.


This means that a certain portion of WebRTC sessions happen over TURN, i.e. they are relayed through a TURN service, when the choice is left to the client. ‘host’, ‘server reflexive’ and ‘relay’ candidates are left to compete with each other, and the best will win, with the caveat that ‘host’ candidates have the highest priority, and ‘relay’ the lowest. This prioritization originates from the logical assumption that a relayed connection may be less performant than a direct one.


There are cases though when using a TURN service is not optional, but mandatory; an RTCPeerConfiguration setting, ‘iceTransportPolicy’ allows this.


In any case, when TURN is used, it’s important to be able to troubleshoot the session establishment, and this article aims to provide some important guidelines.


These are the key points:

  • Acquiring the TURN settings

  • Confirming the reachability of the TURN server

  • Creating a relay allocation on the TURN server

  • Setting permissions for using the created allocations

  • Exchanging ICE connectivity checks over TURN

  • Exchanging media and/or data over TURN


Acquiring the TURN settings

While STUN servers are typically used without the need for authentication, it’s unlikely that a TURN service can. The resources involved in a TURN service are expensive, in particular in the case of highly scalable and distributed systems, and for this reason are only allowed for authenticated customers.


The required TURN settings are:

  • A URL (in the form ‘turn:<FQDN or IP address>:port)

  • An username

  • A password (called ‘credential’)


These are provided inside the ‘iceServers’ configuration structure passed to the RTCPeerConnection at the moment of creation.

Troubleshooting points

It’s important to verify that the TURN settings are correctly configured; in Chrome, open Developer Tools and check in the JavaScript code that the ‘iceServers’ structure contains valid values.


Check also the ‘iceTransportPolicy’ (which default value is ‘all’).

Confirming the reachability of the TURN server

When the ICE candidates gathering phase begins, the ICE client verifies that the TURN URL defines a reachable service by sending a STUN Binding Request towards the IP and port resolved from the ‘iceServer’ settings.


This request originates from the IP address and port that will be used to access the TURN service, and so it will check that it’s suitable for it.


If the STUN Binding Request is received by the TURN server, then it will respond with a STUN Binding Success, carrying an attribute (XOR-MAPPED-ADDRESS) that tells what source IP and port was seen by the server.


If the STUN Binding Success response is received by the client, then there’s proof that the TURN server is reachable.


For example:



Now it’s possible to negotiate a relay allocation.

Troubleshooting points

In the host running the WebRTC client, take a network trace and verify that the STUN Binding Request is addressed to the expected destination (in particular if the TURN URL required a DNS resolution and so there are multiple IP addresses that could be used).

Verify in the trace that the STUN Binding Success Response is received.

Creating a relay allocation on the TURN server

This is the key element: the client asks the TURN server to become a relay on its behalf.


Here the TURN protocol is used, and the client issues an Allocate Request towards the TURN server. This request must be authenticated, for the reasons discussed earlier, and so it’s challenged with a 401 Unauthenticated response, carrying a realm and a nonce.


The client will use the provided credentials (username and credential), together with the given realm and nonce, to compute a MESSAGE-INTEGRITY attribute and send again the Allocate Request with this attribute.


If the credentials are correct (and also the user is allowed to access the service), then the TURN service will reserve a transport address for that allocation: this is the relay transport address. An Allocate Success Response is transmitted to the client, with a XOR-RELAYED-ADDRESS attribute.


At this point the client has gained a ‘relay’ candidate and transmits it to the remote party through the signalling system in use (this is service-specific and not standardized).


Here’s an example of a successful allocation:



Note that a client may create more than one allocation for the same session; each one will be identified by a different source port, so it will be easily identifiable. You can filter them out with something like ‘stun and udp.port==PORT`, where PORT is the client source port for a transaction you’re interested in.

Troubleshooting points

In the host running the WebRTC client, take a network trace and confirm that there’s an Allocate Success Response.


Wrong credentials

In case of wrong credentials, instead of an Allocate Success Response you’ll see another 401 Unauthenticated response. In this case you must check that the credentials are correct, and the user is authorized to access the service.



Other errors for Allocate Request

Any other error for the Allocate Request will have a detailed error code (in a similar fashion as HTTP or SIP have), so take a note on that and search for its root cause.

Setting permissions for using the created allocations


For security reasons, before media or data is exchanged through the relay, the client must set specific permissions for the remote party.


Once the client has a valid relay allocation, every time it receives an ICE candidate from the remote it must set a permission for the remote IP address.


This is accomplished with a TURN CreatePermission Request. The allocation the permission refers to is implicit from the client source IP address and port. The TURN server will respond with a CreatePermission Success if the request is accepted; note that often a client receives ICE candidates with private or reserved IP addresses: in that case the TURN server will most probably reject the request with a 403 Forbidden response.


Example:



Troubleshooting points

In the host running the WebRTC client, take a network trace and confirm that there’s a CreatePermission Success for at least one of the remote candidates.


If no CreatePermission requests are sent, or none of them is successfully accepted, then no relaying will be possible.

Exchanging ICE connectivity checks over TURN

Once the TURN server is reached, a relay allocation reserved and a permission created, there are the conditions for exchanging ICE connectivity checks over TURN.


These are performed by sending STUN Binding Requests with short term credentials; the peculiarity with TURN is that these Binding Requests are encapsulated inside a TURN Send Indication, addressed to the remote peer.


Wireshark will nicely solve this encapsulation for you, and instead of showing a Send Indication will show you its content, the Binding Request.


The TURN server will relay the Binding Request to the remote peer, performing the relay for the first time. The expected outcome is that the remote entity will respond with a Binding Success, which the TURN server will encapsulate inside a Data Indication and deliver to the client.


If that happens, then the client has learned that the remote candidate is indeed reachable via TURN and that’s a suitable candidate pair for exchanging media and data.

Troubleshooting points

In the host running the WebRTC client, take a network trace and confirm that there are Binding Requests carried over TURN that receive a Binding Success.


If Binding Success responses are not received, then something is preventing it and the best way to investigate is to take network traces on the TURN server host, if possible. Those traces will tell you whether the Binding Requests are correctly leaving the TURN server towards the remote party and whether the Binding Success responses are being received or not.


It’s possible that the remote endpoint is simply unreachable from the TURN service, and in this case the ICE candidates pair will be marked as unusable.

Exchanging media and/or data over TURN


The last fundamental step is the actual exchange of packets through the relay. The typical type of packets is RTP.


Once the connectivity checks will be successful, if the client has elected the relay candidate as the one to be used, then RTP can start flowing. You’ll be able to see the RTP packets flowing in both directions, typically with video and audio multiplexed.


There are two ways for transmitting data:

  • Indications

  • Channels


A Send Indication carries the data (RTP) and destination from the client to the TURN server. The TURN server, granted the allocation exists and the permission allows it, will extract the data and send it to the destination from the allocated relay transport address.


When the data arrives from the remote peer to the relay transport address, then the TURN server, after performing the above checks, will encapsulate the data inside a Data Indication and send it to the client.


There is a more efficient way though to exchange data: the client can define a Channel (through the ChannelBind request), which associates a channel ID to a remote party. From that moment both the client and the TURN server can exchange data via ChannelData messages carrying just the channel ID and data, omitting the remote transport address. This reduces the network and computing overhead and it is typically chosen against the use of Indications.

Troubleshooting points

In the host running the WebRTC client, take a network trace and confirm that data is being sent from the client with Send Indications or ChannelData messages, and to the client with Data Indications and ChannelData messages.


In case of monodirectional media, it’s advisable to take network traces on the TURN server host to clarify whether the media is being exchanged or not on the relay side with the remote peer.

Encrypted TURN

It’s possible to use TURN over TLS, with all the data exchanged encrypted. In this case using Wireshark as described won’t allow you to see the details of the requests and responses, and troubleshooting is harder.


One possible approach is to first of all ensure that all the operations described previously happen correctly when using unencrypted TURN (over UDP or TCP). It’s very likely that the TURN service you are using is accessible over unencrypted UDP (default behavior): before moving to TLS ensure UDP works fine.


Wireshark will show you anyway the TLS connections established with the server, so that will confirm whether the connection was successful, the TLS session established, and some application data exchanged.

Useful tools

Wireshark

Wireshark is available for a variety of platforms; it’s a fundamental tool to understand what’s happening between the local WebRTC client and the remote server.


It comes with filters that detect the type of packets. You can use `stun` to filter out STUN and TURN packets, and even select specific TURN transactions, like `stun.type.method==0x0003` to show Allocate Request and Responses.


Saving a trace into a pcap file and making it available to others helps enormously the ability to troubleshoot.


Wireshark can be used for both capturing and just displaying captures.


There are cases where the dissectors, i.e. the interpreters of the packets, don’t recognize a TURN transaction. For example this happens when they happen over a non-default port (3478 for UDP and TCP, 5349 for TLS). To “help” Wireshark, right click on a packet, select “Decode As…” and set ‘STUN’ as protocol: it will correctly interpret all the packets using that non-default port.


The same applies for RTP: when signalling is not available to Wireshark, then UDP packets containing RTP may not be correctly interpreted. Use the same “Decode As…” method.

tcpdump

On the server side, any tool for packet capture would do, with tcpdump being a common solution.


Save the trace into a pcap file with the `-w` option, e.g. `tcpdump -n -v -w trace_1.pcap`, copy it to your machine and use Wireshark to display the packets.


WebRTC samples, Trickle ICE


This open source tool allows you to verify the browser can correctly gather `relay` candidates with the given TURN server details (URL, username, credential).


Before troubleshooting a client implementation, ensure that this tool can correctly access the TURN resources you’re referring to.


turnutils_uclient

The popular open source implementation of a TURN server, coturn, comes with a tool that simulates a client. A plethora of options are available, allowing you to test specific aspects of the TURN operations, e.g. using Send Indications or using Channels, etc.


Use `turnutils_uclient` to ensure the TURN service you want to use is accessible correctly with the given TURN settings, You’ll also get information about the round trip time and jitter.


Chrome webrtc-internals

When using Chrome, the best way to understand what’s happening is to open a tab on chrome://webrtc-internals/. It will show you all the information related to each RTCPeerConnection being managed by the browser at that point, including the list of ICE candidates, the details of the TURN server being used (except the credential for obvious reasons), including the iceTransportPolicy (‘all’ or ‘relay’), the chosen ICE candidates pair, statistics on media transfer, etc.


Search for `relay` candidates and verify the client is able to retrieve them from the TURN service, and whether they are selected as the candidate pair or not.


Conclusions

This article should provide a good checklist for troubleshooting the connection to a TURN service. There is much more to say, in particular for what concerns browsers different than Chrome and server-side investigations: I plan to write about it in the future.



About ICE negotiation

Disclaimer: I wrote this article on March 2022 while working with Subspace, and the original link is here:  https://subspace.com/resources/i...