Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

VXLAN on Linux: Point-to-Point Tunnels and Cross-Host Docker Networking

Tech 2

Linux VXLAN support

The in-kernel VXLAN driver landed in Linux 3.7 (2012). Produtcion deployments typical target ≥3.9/3.10 for maturity and feature coverage. The driver supports IPv4/IPv6 underlay, unicast and multicast flood/learn, and integrates with iproute2.

Check availability by inspecting iproute2’s link types:

man ip-link

Test environment:

  • OS: CentOS Linux release 7.4.1708 (Core)
  • Kernel: 3.10.0-693.2.2.el7.x86_64
  • Hosts: vm1 (eth0: 172.31.0.106), vm2 (eth0: 172.31.0.107)

1) Minimal unicast VXLAN between two hosts

Goal: create a single L2 segment (VNI) across two nodes and assign IPs from 10.0.0.0/24 on the virtual interface. Linux routes 10.0.0.0/24 via the VXLAN device, encapsulating frames to the peer VTEP over the underlay (172.31.0.0/24).

On vm1:

# ip link add vxlan1 type vxlan id 1 remote 172.31.0.107 dstport 4789 dev eth0
# ip link set vxlan1 up
# ip addr add 10.0.0.106/24 dev vxlan1

On vm2:

# ip link add vxlan1 type vxlan id 1 remote 172.31.0.106 dstport 4789 dev eth0
# ip link set vxlan1 up
# ip addr add 10.0.0.107/24 dev vxlan1

Notes:

  • id: VNI (here 1). Use a distinct VNI per overlay segment.
  • remote: unicast peer VTEP. For multicast fabrics, use group instead.
  • dstport: VXLAN UDP port. Linux historically used 8472; IANA standard is 4789. Specify explicitly when needed.
  • dev: egress underlay device.

Inspect the new interface (example):

# ifconfig vxlan1
vxlan1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.0.0.106  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 22:2d:c4:f0:c7:29  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Routing to the overlay (vm1):

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.31.0.253    0.0.0.0         UG    0      0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 vxlan1
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
172.31.0.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0

End-to-end reachability (vm1 → vm2 over the overlay):

# ping 10.0.0.107 -c 3
PING 10.0.0.107 (10.0.0.107) 56(84) bytes of data.
64 bytes from 10.0.0.107: icmp_seq=1 ttl=64 time=0.447 ms
64 bytes from 10.0.0.107: icmp_seq=2 ttl=64 time=0.361 ms
64 bytes from 10.0.0.107: icmp_seq=3 ttl=64 time=0.394 ms

--- 10.0.0.107 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.361/0.400/0.447/0.042 ms

Capture encapsulated traffic on the underlay (vm1):

# tcpdump -i eth0 host 172.31.0.107 and udp port 4789 -s0 -vv -w vxlan_vni_1.pcap

Wireshark decodes UDP dst port 4789 as VXLAN by default. If using 8472, adjust Wireshark’s decode-as settings.


2) Cross-host container L2 with VXLAN and Docker bridges

Containers on the same host can reach each other via a Linux bridge created by Docker. Containers on different hosts need an overlay. Attach a VXLAN device to each host’s Docker bridge to extend that L2 segment across hosts.

2.1 Prepare Docker networks and containers

Docker creates docker0 (default 172.17.0.0/16). Create an additional custom bridge so we can pick addresses explicitly:

# docker network create --subnet 172.18.0.0/16 mynetwork
# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
1cb284a6cb33        bridge              bridge              local
069538be0246        host                host                local
3231f89d69f6        mynetwork           bridge              local
0b7934996485        none                null                local

The custom bridge apppears with an interface similar to br-<NETWORK-ID>:

# ifconfig br-3231f89d69f6
br-3231f89d69f6: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.18.0.1  netmask 255.255.0.0  broadcast 172.18.255.255
        ether 02:42:97:22:a5:f9  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Start a container on vm1 with a static address:

# docker run -itd --net mynetwork --ip 172.18.0.2 centos
16bbaeaaebfccd2a497e3284600f5c0ce230e89678e0ff92f6f4b738c6349f8d

Inspect from inside the container (install net-tools if needed):

# docker exec -it 16bbaeaaebfc /bin/bash
[root@16bbaeaaebfc /]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.2  netmask 255.255.0.0  broadcast 172.18.255.255
        ether 02:42:ac:12:00:02  txqueuelen 0  (Ethernet)
        RX packets 3319  bytes 19221325 (18.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2015  bytes 132903 (129.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

On vm2, create another container on the same custom network with 172.18.0.3. Before adding VXLAN, cross-host ping fails as expected:

[root@16bbaeaaebfc /]# ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
From 172.18.0.2 icmp_seq=1 Destination Host Unreachable
From 172.18.0.2 icmp_seq=2 Destination Host Unreachable

--- 172.18.0.3 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1000ms

docker0/mynetwork relationships on the host (example):

# brctl show
bridge name          bridge id           STP enabled     interfaces
br-3231f89d69f6      8000.02429722a5f9   no              veth2fa4c50
docker0              8000.024244e874e8   no              vethc7cd982
                                                     vethd3d0c18

Each running container contributes one veth peer that is enslaved to the appropriate Linux bridge.

2.2 Attach a VXLAN device to each Docker bridge

Create a VXLAN interface per host for a shared VNI, then enslave it to the custom Docker bridge. Frames from containers hitting the bridge can traverse the overlay via the VXLAN port.

On vm1:

# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.107 dstport 4789 dev eth0
# ip link set vxlan_docker up
# brctl addif br-3231f89d69f6 vxlan_docker
# # Alternatively (modern):
# ip link set vxlan_docker master br-3231f89d69f6

On vm2:

# ip link add vxlan_docker type vxlan id 200 remote 172.31.0.106 dstport 4789 dev eth0
# ip link set vxlan_docker up
# brctl addif br-f4b35af34313 vxlan_docker
# # Alternatively:
# ip link set vxlan_docker master br-f4b35af34313

Validate cross-host connectivity from vm1’s container:

# docker exec -it 16bbaeaaebfc ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.2  netmask 255.255.0.0  broadcast 172.18.255.255
        ether 02:42:ac:12:00:02  txqueuelen 0  (Ethernet)
        RX packets 3431  bytes 19230266 (18.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2132  bytes 141908 (138.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# docker exec -it 16bbaeaaebfc ping 172.18.0.3 -c 2
PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data.
64 bytes from 172.18.0.3: icmp_seq=1 ttl=64 time=0.544 ms
64 bytes from 172.18.0.3: icmp_seq=2 ttl=64 time=0.396 ms

--- 172.18.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.396/0.470/0.544/0.074 ms

# docker exec -it 16bbaeaaebfc ping 172.18.0.1 -c 2
PING 172.18.0.1 (172.18.0.1) 56(84) bytes of data.
64 bytes from 172.18.0.1: icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from 172.18.0.1: icmp_seq=2 ttl=64 time=0.072 ms

--- 172.18.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms

The RTT to the peer cotnainer reflects underlay traversal, while RTT to the bridge gateway (same host) reflects local stack processing.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.