Home > Tech > Content

Kernel-Space Load Balancing with Linux Virtual Server

Tech Apr 17 17

Linux Virtual Server (LVS) operates as a kernel-integrated traffic distribution mechanism designed to pool multiple backend nodes into a unified, highly available cluster. By intercepting incoming network packets at the OSI transport layer, it forwards client requests across real servers using deterministic scheduling algorithms.

Architectural Components

Virtual Server (VS): The logical endpoint exposed to external clients, defined by a shared IP and port pair.
Director Server (DS): The central routing node that evaluates incoming traffic and applies load-balancing rules.
Real Server (RS): The physical or virtual machines executing the actual application workloads.
Network Addresses: CIP (Client), VIP (Cluster-facing), DIP (Internal director interface), RIP (Backend node interface).

Traffic Routing Paradigms

Network Address Translation (NAT)

In this model, the director rewrites both the destination IP/port on ingress and the source IP/port on egress. Traffic flows bidirectionally through the DS, making it suitable for smaller deployments where centralized control simplifies management. However, the director becomes a throughput bottleneck since all return traffic must traverse it.

Direct Routing (DR)

DR operates at the data-link layer. The director only modifies the destination MAC address of incoming frames to match a selected backend node. The RS processes the payload and replies directly to the client using its own network stack, byapssing the director entirely. This design drastically reduces latency and increases aggregate bandwidth, though it requires all nodes to reside on the same broadcast domain and demands careful ARP suppression to prevent IP conflicts.

IPVS Management Utility

The ipvsadm command-line interface manipulates the kernel's IPVS table. Core operations include:

-A/-D: Create or remove a virtual service definition.
-a/-d: Attach or detach a real server node.
-s: Assign a scheduling algorithm (rr for round-robin, wrr for weighted, lc for least connections, wlc for weighted least connections).
-g/-m/-i: Specify DR, NAT, or IP Tunneling modes respectively.
-w: Assign traffic distribution weights.
-p: Define session persistence duration.

Configuration Scenarios

NAT Deployment Example

# Enable kernel packet forwarding on the director
sysctl -w net.ipv4.ip_forward=1

# Apply SNAT for backend egress traffic
iptables -t nat -A POSTROUTING -s 10.0.2.0/24 -o eth1 -j MASQUERADE

# Define the virtual service and attach backends
ipvsadm -C
ipvsadm -A -t 203.0.113.10:80 -s wrr
ipvsadm -a -t 203.0.113.10:80 -r 10.0.2.51:80 -m -w 3
ipvsadm -a -t 203.0.113.10:80 -r 10.0.2.52:80 -m -w 1

# Persist the routing table
ipvsadm-save -n > /etc/sysconfig/ipvsadm

Direct Routing Implementation

Director node setup involves attaching the VIP to a secondary interface and suppressing ICMP redirects:

ip addr add 203.0.113.100/32 dev eth0:0
sysctl -w net.ipv4.conf.eth0.send_redirects=0
sysctl -w net.ipv4.conf.all.send_redirects=0
sysctl -w net.ipv4.ip_forward=0

ipvsadm -A -t 203.0.113.100:80 -s rr
ipvsadm -a -t 203.0.113.100:80 -r 10.0.1.11:80 -g
ipvsadm -a -t 203.0.113.100:80 -r 10.0.1.12:80 -g

Backend nodes require the same VIP bound to the loopback interface, coupled with ARP tuning to prevent MAC address collisions:

ip addr add 203.0.113.100/32 dev lo:0
ip route add 203.0.113.100 dev lo:0

echo "net.ipv4.conf.all.arp_ignore=1" >> /etc/sysctl.conf
echo "net.ipv4.conf.lo.arp_ignore=1" >> /etc/sysctl.conf
echo "net.ipv4.conf.all.arp_announce=2" >> /etc/sysctl.conf
echo "net.ipv4.conf.lo.arp_announce=2" >> /etc/sysctl.conf
sysctl -p

High Availability & Failover Mechanisms

LVS itself does not monitor director health. Integrating Keepalived provides automatic VIP migration through the VRRP protocol. A primary node holds the virtual address while a standby node remains idle. Heartbeat packets exchanged over multicast determine node liveness. If the primary fails, the standby assumes the VIP within seconds. Split-brain scenarios—where both nodes claim VIP—typically stem from network segmentation, firewall blocks on VRRP traffic, or desynchronized system clocks. Proper configuration of vrrp_strict, priority tiers, and unicast peers mitigates these risks.

Keepalived configuration structure:

vrrp_instance HA_CLUSTER {
    state MASTER
    interface eth0
    virtual_router_id 55
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass SecureVRRP2024
    }
    virtual_ipaddress { 203.0.113.100 }
}

virtual_server 203.0.113.100 80 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    protocol TCP
    real_server 10.0.1.11 80 {
        weight 1
        TCP_CHECK {
            connect_port 80
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 2
        }
    }
}

Architectural Comparisons & Multi-Tier Designs

While LVS operates strictly at Layer 4 (TCP/UDP), Nginx and HAProxy provide Layer 7 (HTTP/HTTPS) content-aware routing. Modern infrastructures frequently combine these tools: LVS handles masssive raw packet distribution across a pool of Nginx reverse proxies, which then perform URL-based routing, TLS termination, and caching before forwarding dynamic requests to application servers.

HAProxy excels in high-concurrency TCP/HTTP balancing, supporting connection pooling, advanced health checks, and session persistence without relying on kernel modifications. Its configuration separates frontend listeners from backend server pools:

frontend web_gateway
    bind *:80
    mode tcp
    default_backend app_nodes

backend app_nodes
    mode tcp
    balance roundrobin
    option tcp-check
    server node_a 10.0.1.11:80 check inter 3000 fall 3 weight 2
    server node_b 10.0.1.12:80 check inter 3000 fall 3 weight 3

System scalability follows two paths: vertical scaling (upgrading CPU/RAM on single nodes, bound by hardware limits) and horizontal scaling (adding nodes to a cluster, relying on network topology for inter-node communication). Reliability metrics like MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) quantify cluster resilience, with availability calculated as A = MTBF / (MTBF + MTTR). Achieving "four nines" (99.99%) availability restricts unplanned downtime to under 52 minutes annually, necessitating automated failover and rigorous operational monitoring.

Tags: Linux

Back to List

Prev: Diagnosing Spring Microservice Freezes: Connection Pool Exhaustion Analysis

Next: Predicting Passenger Survival on the Titanic Using Ensemble Methods

Fading Coder

Kernel-Space Load Balancing with Linux Virtual Server

Architectural Components

Traffic Routing Paradigms

Network Address Translation (NAT)

Direct Routing (DR)

IPVS Management Utility

Configuration Scenarios

NAT Deployment Example

Direct Routing Implementation

High Availability & Failover Mechanisms

Architectural Comparisons & Multi-Tier Designs

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor