VXLAN on Linux: Point-to-Point Tunnels and Cross-Host Docker Networking
Linux VXLAN support
The in-kernel VXLAN driver landed in Linux 3.7 (2012). Produtcion deployments typical target ≥3.9/3.10 for maturity and feature coverage. The driver supports IPv4/IPv6 underlay, unicast and multicast flood/learn, and integrates with iproute2.
Check availability by inspecting iproute2’s link types:
man ip-link
Test environment:
- OS: CentOS Linux release 7.4.1708 (Core)
- Kernel: 3.10.0-693.2.2.el7.x86_64
- Hosts: vm1 (eth0: 172.31.0.106), vm2 (eth0: 172.31.0.107)
1) Minimal unicast VXLAN between two hosts
Goal: create a single L2 segment (VNI) across two nodes and assign IPs from 10.0.0.0/24 on the virtual interface. Linux routes 10.0.0.0/24 via the VXLAN device, encapsulating frames to the peer VTEP over the underlay (172.31.0.0/24).
On vm1:
# ip link add vxlan1 type vxlan id 1 remote 172.31.0.107 dstport 4789 dev eth0
# ip link set vxlan1 up
# ip addr add 10.0.0.106/24 dev vxlan1
On vm2:
# ip link add vxlan1 type vxlan id 1 remote 172.31.0.106 dstport 4789 dev eth0
# ip link set vxlan1 up
# ip addr add 10.0.0.107/24 dev vxlan1
Notes:
- id: VNI (here 1). Use a distinct VNI per overlay segment.
- remote: unicast peer VTEP. For multicast fabrics, use group instead.
- dstport: VXLAN UDP port. Linux historically used 8472; IANA standard is 4789. Specify explicitly when needed.
- dev: egress underlay device.
Inspect the new interface (example):
# ifconfig vxlan1
vxlan1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.0.0.106 netmask 255.255.255.0 broadcast 0.0.0.0
ether 22:2d:c4:f0:c7:29 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Routing to the overlay (vm1):
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.0.253 0.0.0.0 UG 0 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 vxlan1
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0
172.31.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
End-to-end reachability (vm1 → vm2 over the overlay):
# ping 10.0.0.107 -c 3
PING 10.0.0.107 (10.0.0.107) 56(84) bytes of data.
64 bytes from 10.0.0.107: icmp_seq=1 ttl=64 time=0.447 ms
64 bytes from 10.0.0.107: icmp_seq=2 ttl=64 time=0.361 ms
64 bytes from 10.0.0.107: icmp_seq=3 ttl=64 time=0.394 ms
--- 10.0.0.107 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.361/0.400/0.447/0.042 ms
Capture encapsulated traffic on the underlay (vm1):
# tcpdump -i eth0 host 172.31.0.107 and udp port 4789 -s0 -vv -w vxlan_vni_1.pcap
Wireshark decodes UDP dst port 4789 as VXLAN by default. If using 8472, adjust Wireshark’s decode-as settings.
2) Cross-host container L2 with VXLAN and Docker bridges
Containers on the same host can reach each other via a Linux bridge created by Docker. Containers on different hosts need an overlay. Attach a VXLAN device to each host’s Docker bridge to extend that L2 segment across hosts.
2.1 Prepare Docker networks and containers
Docker creates docker0 (default 172.17.0.0/16). Create an additional custom bridge so we can pick addresses explicitly:
# docker network create --subnet 172.18.0.0/16 mynetwork
# docker network ls
NETWORK ID NAME DRIVER SCOPE
1cb284a6cb33 bridge bridge local
069538be0246 host host local
3231f89d69f6 mynetwork bridge local
0b7934996485 none null local
The custom bridge apppears with an interface similar to br-<NETWORK-ID>:
# ifconfig br-3231f89d69f6
br-3231f89d69f6: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.18.0.1 netmask 255.255.0.0 broadcast 172.18.255.255
ether 02:42:97:22:a5:f9 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Start a container on vm1 with a static address:
# docker run -itd --net mynetwork --ip 172.18.0.2 centos
16bbaeaaebfccd2a497e3284600f5c0ce230e89678e0ff92f6f4b738c6349f8d
Inspect from inside the container (install net-tools if needed):
# docker exec -it 16bbaeaaebfc /bin/bash
[root@16bbaeaaebfc /]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.0.2 netmask 255.255.0.0 broadcast 172.18.255.255
ether 02:42:ac:12:00:02 txqueuelen 0 (Ethernet)
RX packets 3319 bytes 19221325 (18.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2015 bytes 132903 (129.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
On vm2, create another container on the same custom network with 172.18.0.3. Before adding VXLAN, cross-host ping fails as expected:
[root@16bbaeaaebfc /]# ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
From 172.18.0.2 icmp_seq=1 Destination Host Unreachable
From 172.18.0.2 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.3 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1000ms
docker0/mynetwork relationships on the host (example):
# brctl show
bridge name bridge id STP enabled interfaces
br-3231f89d69f6 8000.02429722a5f9 no veth2fa4c50
docker0 8000.024244e874e8 no vethc7cd982
vethd3d0c18
Each running container contributes one veth peer that is enslaved to the appropriate Linux bridge.
2.2 Attach a VXLAN device to each Docker bridge
Create a VXLAN interface per host for a shared VNI, then enslave it to the custom Docker bridge. Frames from containers hitting the bridge can traverse the overlay via the VXLAN port.
On vm1:
# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.107 dstport 4789 dev eth0
# ip link set vxlan_docker up
# brctl addif br-3231f89d69f6 vxlan_docker
# # Alternatively (modern):
# ip link set vxlan_docker master br-3231f89d69f6
On vm2:
# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.106 dstport 4789 dev eth0
# ip link set vxlan_docker up
# brctl addif br-f4b35af34313 vxlan_docker
# # Alternatively:
# ip link set vxlan_docker master br-f4b35af34313
Validate cross-host connectivity from vm1’s container:
# docker exec -it 16bbaeaaebfc ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.0.2 netmask 255.255.0.0 broadcast 172.18.255.255
ether 02:42:ac:12:00:02 txqueuelen 0 (Ethernet)
RX packets 3431 bytes 19230266 (18.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2132 bytes 141908 (138.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# docker exec -it 16bbaeaaebfc ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
64 bytes from 172.18.0.3: icmp_seq=1 ttl=64 time=0.544 ms
64 bytes from 172.18.0.3: icmp_seq=2 ttl=64 time=0.396 ms
--- 172.18.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.396/0.470/0.544/0.074 ms
# docker exec -it 16bbaeaaebfc ping 172.18.0.1 -c 2
PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data.
64 bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.072 ms
--- 172.18.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
The RTT to the peer cotnainer reflects underlay traversal, while RTT to the bridge gateway (same host) reflects local stack processing.