Skip to main content

Docker networking and a tricky behaviour

There's something about debugging: the more you have experienced finding the root cause of bugs in the past, the highest the hope and confidence you'll squash the one currently under the microscope.

You see I didn't mention "fixing" a bug, because I think finding the root cause of any bug has value per se. Fixing them is a separate adventure.

Anyway this week I was entirely bamboozled by this: a Docker container was re-deployed with a different networking configuration. In particular, with Compose, I was dedicating a network for that container (let's say 172.18.0.0/24) separate from the default Docker network (e.g. 172.17.0.0/24).

In docker-compose.yml:

networks:
  apps:
    driver: bridge
    ipam:
      driver: default
      config:
      - subnet: 172.18.0.0/24
        gateway: 172.18.0.1


In the "service" definition I just added this:

services:
  THESERVICE:
     ..
      networks:
        - apps

No need to be more specific with names because Compose will add the project name to the network, so in the Docker network list you see something like: MYPROJECT_apps.

This change worked perfectly on two previous deployments - in fact they were deployments on intermediate stages, in preparation for a deployment on a third stage. Everything went smoothly there, no big deal, Compose updating the networking as the new configuration was brought up.

This particular container exposes a UDP port (say 8888) and it's expected to receive UDP packets from the VLAN via the host interface. Packets arrive at the host's UDP 8888 and are forwarded to the container's UDP 8888, where they are processed. Very simple.

But when I did the deployment on the third environment, reasonably similar to the first two, things just didn't work. tcpdump was showing that the UDP packets were still received by the host but not forwarded to the container.

The forwarding rules - inspected with iptables - were correct. The container had the expected IP Address. ip route was correct. Where are those packets going then?

I ssh'd into one of the machines that were sending UDP traffic to the receiving host, opened a netcat connections to that host, port 8888, and sent some data. Those packets were correctly being received by the container! So some traffic with similar characteristics was received, some wasn't.

Perhaps an issue with data length? Nope. I tried sending big UDP packets and they were correctly received too. It was a weak lead anyway, because the source traffic was of various lengths while not a single packet was forwarded.

Errors logged somewhere? None.

iptables stats were confirming the arrival of the packets, and that they were not routed to the container's virtual interface.

At this point I was enough out of ideas to start tapping the shoulder of some colleague. "I'm observing an interesting behaviour!". "Have you tried...": "Yes". "How about... ": "Checked." "WTF?": "Exactly".

It was indeed a WTF moment. Surely there were some differences in the list of virtual interfaces on the problematic host, but none seemed to have an impact.

We looked deeper into the network traces. And there, glooming in their "light green on black" magnificence there were the ICMP Destination Unreachable responses from the receiving host to the producing hosts. A confirmation that packets were attempted to be sent to an unreachable destination. Why? The forwarding rules were not just right, but also easily testable. It was just the "official" traffic that wasn't forwarded...

Then a colleague, one of the brightest I've ever had, started talking about tracked connections (/proc/net/nf_conntrack). We could see our netcat connections were tracked, as they were the ones from the other hosts producing data. But there was an important detail. The connections from the "official" producing hosts were associated with the old container IP address (e.g. 172.17.0.2 instead of the new one, 172.18.0.2)!

The usage of conntrack is visible from iptables' output, in particular for the FORWARDING table:

-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Why didn't we observe this behaviour in the other environments? Simple: there the traffic was more sporadic, and so the tracked connections had the time to expire. In the failing environment the continuous flow of UDP packets kept refreshing the tracked connections, even though the forwarding was actually failing.

Only at that point I had the right scenario described, and I could find this Docker issue. From there:
When you restart the container, container's ip has changed, so the DNAT rule, which will route to the new address. But the old connection's state in conntrack is not cleared. So when a packet arrives, it will not go through NAT table again, because it is not "the first" packet. So the solution is clearing the conntrack, [...].

It was a fun day :) I hope this may save some hours of head scratching to somebody in the future.


Popular posts from this blog

Troubleshooting TURN

  WebRTC applications use the ICE negotiation to discovery the best way to communicate with a remote party. I t dynamically finds a pair of candidates (IP address, port and transport, also known as “transport address”) suitable for exchanging media and data. The most important aspect of this is “dynamically”: a local and a remote transport address are found based on the network conditions at the time of establishing a session. For example, a WebRTC client that normally uses a server reflexive transport address to communicate with an SFU. when running inside the home office, may use a relay transport address over TCP when running inside an office network which limits remote UDP targets. The same configuration (defined as “iceServers” when creating an RTCPeerConnection will work in both cases, producing different outcomes.

VoIP calls encoded with SILK: from RTP to WAV

SILK is a codec defined by Skype, but can be found in many VoIP clients, like CSipSimple . It comes in different flavours (sample rates and frame sizes), from narrowband (8 KHz) to wideband (24 KHz). Since Wireshark doesn't allow you to decode an RTP stream carrying SILK frames, I was curious to find a programmatic way to do it. In fact, this has also allowed to me to earn a "tumbleweed" badge in stackoverflow . You may argue that a Wireshark plugin would be the right solution, but that's probably for another day. Initially I thought it was sufficient to read the specification for RTP payload when using SILK ; the truth is that I had to reverse engineer a solution by looking at SILK SDK's test vectors. There, I discovered that a file containing SILK audio doesn't have the file header indicated in the IETF draft ("!#SILK"), but a slightly different one ("!#SILK_V3"). More importantly, each encoded frame is not preced...

Extracting Opus from a pcap file into an audible wav

From time to time I need to verify that the audio inside a trace is as expected. Not much in terms of quality, but more often content and duration. A few years ago I wrote a small program to transform a pcap into a wav file - the codec in use was SILK. These days I'm dealing with Opus , and I have to say things are greatly simplified, in particular if you consider opus-tools , a set of utilities to handle opus files and traces. One of those tools, opusrtp , can do live captures and write the interpreted payload into a .opus file. Still, what I needed was to achieve the same result but from a pcap already existing, i.e. "offline". So I come up with a small - quite shamlessly copy&pasted - patch to opusrtc, which is now in this fork . Once you have a pcap with an RTP stream with opus (say in input.pcap ) you can retrieve the .opus equivalent (in rtpdump.opus ) with: ./opusrtp --extract input.pcap Then you can generate an audible wav file with: ./opusd...