Skip to main content

Docker networking and a tricky behaviour

There's something about debugging: the more you have experienced finding the root cause of bugs in the past, the highest the hope and confidence you'll squash the one currently under the microscope.

You see I didn't mention "fixing" a bug, because I think finding the root cause of any bug has value per se. Fixing them is a separate adventure.

Anyway this week I was entirely bamboozled by this: a Docker container was re-deployed with a different networking configuration. In particular, with Compose, I was dedicating a network for that container (let's say 172.18.0.0/24) separate from the default Docker network (e.g. 172.17.0.0/24).

In docker-compose.yml:

networks:
  apps:
    driver: bridge
    ipam:
      driver: default
      config:
      - subnet: 172.18.0.0/24
        gateway: 172.18.0.1


In the "service" definition I just added this:

services:
  THESERVICE:
     ..
      networks:
        - apps

No need to be more specific with names because Compose will add the project name to the network, so in the Docker network list you see something like: MYPROJECT_apps.

This change worked perfectly on two previous deployments - in fact they were deployments on intermediate stages, in preparation for a deployment on a third stage. Everything went smoothly there, no big deal, Compose updating the networking as the new configuration was brought up.

This particular container exposes a UDP port (say 8888) and it's expected to receive UDP packets from the VLAN via the host interface. Packets arrive at the host's UDP 8888 and are forwarded to the container's UDP 8888, where they are processed. Very simple.

But when I did the deployment on the third environment, reasonably similar to the first two, things just didn't work. tcpdump was showing that the UDP packets were still received by the host but not forwarded to the container.

The forwarding rules - inspected with iptables - were correct. The container had the expected IP Address. ip route was correct. Where are those packets going then?

I ssh'd into one of the machines that were sending UDP traffic to the receiving host, opened a netcat connections to that host, port 8888, and sent some data. Those packets were correctly being received by the container! So some traffic with similar characteristics was received, some wasn't.

Perhaps an issue with data length? Nope. I tried sending big UDP packets and they were correctly received too. It was a weak lead anyway, because the source traffic was of various lengths while not a single packet was forwarded.

Errors logged somewhere? None.

iptables stats were confirming the arrival of the packets, and that they were not routed to the container's virtual interface.

At this point I was enough out of ideas to start tapping the shoulder of some colleague. "I'm observing an interesting behaviour!". "Have you tried...": "Yes". "How about... ": "Checked." "WTF?": "Exactly".

It was indeed a WTF moment. Surely there were some differences in the list of virtual interfaces on the problematic host, but none seemed to have an impact.

We looked deeper into the network traces. And there, glooming in their "light green on black" magnificence there were the ICMP Destination Unreachable responses from the receiving host to the producing hosts. A confirmation that packets were attempted to be sent to an unreachable destination. Why? The forwarding rules were not just right, but also easily testable. It was just the "official" traffic that wasn't forwarded...

Then a colleague, one of the brightest I've ever had, started talking about tracked connections (/proc/net/nf_conntrack). We could see our netcat connections were tracked, as they were the ones from the other hosts producing data. But there was an important detail. The connections from the "official" producing hosts were associated with the old container IP address (e.g. 172.17.0.2 instead of the new one, 172.18.0.2)!

The usage of conntrack is visible from iptables' output, in particular for the FORWARDING table:

-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Why didn't we observe this behaviour in the other environments? Simple: there the traffic was more sporadic, and so the tracked connections had the time to expire. In the failing environment the continuous flow of UDP packets kept refreshing the tracked connections, even though the forwarding was actually failing.

Only at that point I had the right scenario described, and I could find this Docker issue. From there:
When you restart the container, container's ip has changed, so the DNAT rule, which will route to the new address. But the old connection's state in conntrack is not cleared. So when a packet arrives, it will not go through NAT table again, because it is not "the first" packet. So the solution is clearing the conntrack, [...].

It was a fun day :) I hope this may save some hours of head scratching to somebody in the future.


Popular posts from this blog

Troubleshooting TURN

  WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. I t dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data. The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.

Extracting RTP streams from network captures

I needed an efficient way to programmatically extract RTP streams from a network capture. In addition I wanted to: save each stream into a separate pcap file. extract SRTP-negotiated keys if present and available in the trace, associating them to the related RTP (or SRTP if the negotiation succeeded) stream. Some caveats: In normal conditions the negotiation of SRTP sessions happens via a secure transport, typically SIP over TLS, so the exchanged crypto information may not be available from a simple network capture. There are ways to extract RTP streams using Wireshark or tcpdump; it’s not necessary to do it programmatically. All this said I wrote a small tool ( https://github.com/giavac/pcap_tool ) that parses a network capture and tries to interpret each packet as either RTP/SRTP or SIP, and does two main things: save each detected RTP/SRTP stream into a dedicated pcap file, which name contains the related SSRC. print a summary of the crypto information exchanged, if available. With ...

Testing SIP platforms and pjsip

There are various levels of testing, from unit to component, from integration to end-to-end, not to mention performance testing and fuzzing. When developing or maintaining Real Time Communications (RTC or VoIP) systems,  all these levels (with the exclusion maybe of unit testing) are made easier by applications explicitly designed for this, like sipp . sipp has a deep focus on performance testing, or using a simpler term, load testing. Some of its features allow to fine tune properties like call rate, call duration, simulate packet loss, ramp up traffic, etc. In practical terms though once you have the flexibility to generate SIP signalling to negotiate sessions and RTP streams, you can use sipp for functional testing too. sipp can act as an entity generating a call, or receiving a call, which makes it suitable to surround the system under test and simulate its interactions with the real world. What sipp does can be generalised: we want to be able to simulate the real world tha...