Extending on Laurent Bernaille’s excellent 3-part deep dive series on Docker’s overlay networks I wanted to experiment with how we could utilise Docker’s overlay network to tunnel traffic between the nodes in a Swarm outside of the container environment, essentially treating the Overlay network as a mesh VPN between host nodes.
Prepare the host
The first step is to symlink Docker’s netns mount points to a location where we can use the iproute2 utilities to interact with them:
ln -s /var/run/docker/netns /var/run/netns
We can now view Docker’s network namespaces:
Comparing output with Docker’s networks, we can identify the relevant netns as they’re named according to Docker’s network ID (look for
~# docker network ls NETWORK ID NAME DRIVER SCOPE ed31264a1f4f bridge bridge local 5ef35596d5b1 docker_gwbridge bridge local or1wj1px3q8b testoverlay overlay swarm <- bf9f478ebd5d host host local scybkysot08x ingress overlay swarm a9871bc3532d none null local ~# ip netns fe5b42ad2e7e (id: 3) 1-or1wj1px3q (id: 2) <- lb_or1wj1px3 1-scybkysot0 (id: 0) ingress_sbox (id: 1)
Now that we’ve identified the correct netns for the overlay network, we need to add a virtual interface to the host to give us access into the overlay network’s namespace.
Accessing the Overlay network from the Docker host
Create a new veth pair
veth interfaces are created in pairs and allow us to pass traffic between namespaces. First, we’ll create a new veth pair and move one end into the relevant namespace for the Docker overlay network we want to connect into:
ip link add dev veth1 type veth peer name veth2 ip link set dev veth2 netns 1-or1wj1px3q
Assign an IP address and set the MAC address on the host interface
First we allocate our host interface (veth1) an IP address in the Overlay network’s subnet
ip a a 10.0.0.100/24 dev veth1
For consistency, we use the same MAC address scheme as Docker: the last 4 bytes of the MAC address match the IP address of the interface and the second one is the VXLAN ID:
ip link set dev veth1 address 02:42:0a:00:00:64
Add veth2 to Docker’s bridge device
Bridge devices act like small switches, so to connect us into the Overlay network we need to plug one end of our veth pair (veth2 — the one we migrated into Docker’s netns) into Docker’s bridge device.
ip netns exec 1-or1wj1px3q ip link set master br0 veth2
This is the same method used by Docker to connect containers into the local Overlay network on each node.
To allow for the size of the VXLAN headers, we need to drop the MTU on our interfaces:
ip netns exec 1-or1wj1px3q ip link set mtu 1450 veth2 ip link set mtu 1450 dev veth1
Bring the interfaces up
Now everything’s configured, the last step for connecting our local host into the overlay network is to bring up the virtual interfaces:
ip netns exec 1-or1wj1px3q ip link set up dev veth2 ip link set up dev veth1
If everything is configured correctly, we’re now able to access containers running on our local host and attached to the overlay network directly. You can test this by firing up a container attached to the network (assuming the network was created with the attachable flag) and pinging the assigned container IP directly from the host.
One of the key features of Docker’s Overlay networks is the ability for containers to communicate across different host nodes, so the next step is to enable forwarding over the VXLAN tunnel for our new interfaces.
Configure forwarding over VXLAN overlay
Once all of the nodes in the Swarm have been configured to access the overlay network locally (as above), we need to tell the kernel how traffic should be passed between hosts. Usually Swarm manages this for us, but since we’re working outside of the Swarm context we need to configure this manually.
Create permanent ARP entries
As we don’t have a Layer 2 network spanning the network namespaces on our nodes, we need to manually create ARP table entries to tell the kernel through which device to send traffic destined for the other side of our VXLAN tunnel. Here we add the IP and MAC address of the veth1 (host side) interface on the remote side of the tunnel, and tell the kernel to forward traffic via the vxlan0 device. This device is created by Docker as part of the Swarm overlay network.
ip netns exec 1-or1wj1px3q ip n a 10.0.0.101 lladdr 02:42:0a:00:00:65 nud permanent dev vxlan0
This needs to be completed on each of the nodes in the Swarm, with an entry added for every other node. Note that since the overlay network is part of the distributed Swarm config they share the same network ID and hence the network namespaces share a common name across host nodes.
With the kernel now forwarding traffic to the vxlan0 device, we just need to configure our bridge’s forwarding database to pass the traffic to the correct remote host over the VXLAN tunnel.
Again for each node, we need to configure the correct remote endpoint to forward our traffic to. Here we tell the forwarding database that MAC address 02:42:0a:00:00:65 (the veth1 interface on the remote host with overlay IP address 10.0.0.101) resides on the remote host with IP x.x.x.x (where x.x.x.x is the IP address used as Swarm’s advertise-addr)
ip netns exec 1-or1wj1px3q bridge fdb add 02:42:0a:00:00:65 dev vxlan0 dst x.x.x.x self permanent
We can now test connectivity by pinging the remote host’s overlay IP from our local host:
~# ping 10.0.0.100 PING 10.0.0.100 (10.0.0.100) 56(84) bytes of data. 64 bytes from 10.0.0.100: icmp_seq=1 ttl=64 time=10.4 ms 64 bytes from 10.0.0.100: icmp_seq=2 ttl=64 time=10.0 ms
We’ve demonstrated how Docker’s Overlay networks use VXLANs and network namespaces to provide container network isolation and inter-host communication, as well as how we can manipulate these to tunnel traffic between host nodes across one of those overlay networks.
Note that bridging a host interface into an overlay network is definitely not the recommended way to expose containers to the Docker host, or to the Docker host’s network. There are also other factors to consider such as IP addresses conflicting with Docker’s IPAM and persistence — Docker is clever enough to only configure an overlay network on a host when it’s required by a running container, so it’s not necessarily safe to rely on existence of the overlay network on any given node.