Kubernetes Service Implementation: A Deep Dive into iptables Mode
Overview
This article continues our exploration of Kubernetes networking by examining how Service objects are implemented using iptables. Building on our previous discussion of CNI plugins and overlay networks, we'll trace how kube-proxy configures iptables rules to enable Service-to-Pod traffic routing.
Preparing Service and Pod Resources
The example setup uses a simple NGINX deployment with two replicas:
nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
spec:
replicas: 2
selector:
matchLabels:
component: web
template:
metadata:
labels:
component: web
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
type: NodePort
selector:
component: web
ports:
- protocol: TCP
port: 80
nodePort: 30007
Kubernetes Service Implementation Principles
Kubernetes supports multiple Service types:
- ClusterIP: Assigns an internal cluster IP, making the Service accessible only within the cluster
- NodePort: Exposes a static port on every node, allowing external access via
<NodeIP>:<NodePort> - LoadBalancer: Integrates with cloud load balancers to distribute external traffic
- ExternalName: Maps a Service to an external DNS name
kube-proxy Component
Each node in a Kubernetes cluster runs kube-proxy, which maintains network connectivity for Services. The component operates in several proxy modes:
iptables Mode
This mode programs iptables rules to intercept Service traffic and redirect it to backend Pods. The proxy updates rules whenever Services or Pods change. While simple and widely adopted, this approach can suffer performance degradation in large clusters because each packet must traverse a chain of rules.
IPVS (IP Virtual Server) Mode
IPVS leverages the kernel's IPVS subsystem for built-in load balancing. It handles higher traffic volumes with better performance and supports sophisticated scheduling algorithms like least connections. In this mode, kube-proxy creates a virtual server with a VIP for each Service and distributes traffic across backend Pods.
This article focuses on iptables mode to illustrate how traffic flows from a Service to its Pods.
iptables Fundamentals
iptables is a Linux kernel netfilter configuration tool that allows administrators to define packet filtering, NAT, and port forwarding rules. Key capabilities include:
- Packet filtering: Controlling which packets pass through network interfaces
- Network Address Translation (NAT): Modifying source or destination addresses
- Port forwarding: Redirecting traffic to specific ports
iptables Rule Analysis
Service, Pod, and Host Configuration
After deploying the resources above, the environment shows:
Sevrices:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web-service NodePort 10.107.33.105 <none> 80:30007/TCP 152m
Pods:
NAME READY STATUS RESTARTS AGE IP NODE
web-frontend-9d7f5c-lmnop 1/1 Running 0 155m 10.244.2.4 node-03
web-frontend-9d7f5c-qrstu 1/1 Running 0 155m 10.244.1.5 node-02
Cluster Nodes:
192.168.49.2 node-01
192.168.49.3 node-02
192.168.49.4 node-03
Tracing iptables Rules from NodePort
On the primary node, examine the nat table filtered by the NodePort 30007:
sudo iptables -t nat -L | grep 30007
Output:
KUBE-EXT-ABCD1234EFGH5678 tcp -- anywhere anywhere /* web/web-service */ tcp dpt:30007
This reveals a chain named KUBE-EXT-ABCD1234EFGH5678. Inspect its contents:
sudo iptables -t nat -L KUBE-EXT-ABCD1234EFGH5678 -v
Result:
Chain KUBE-EXT-ABCD1234EFGH5678 (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-ABCD1234EFGH5678 all -- any any anywhere anywhere
The chain references KUBE-SVC-ABCD1234EFGH5678. Examining that chain:
sudo iptables -t nat -L KUBE-SVC-ABCD1234EFGH5678 -v
Output:
Chain KUBE-SVC-ABCD1234EFGH5678 (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SEP-PQRSTU9876543210 all -- any any anywhere anywhere /* web/web-service -> 10.244.1.5:80 */ statistic mode random probability 0.50000000000
0 0 KUBE-SEP-VWXYZA1234567890 all -- any any anywhere anywhere /* web/web-service -> 10.244.2.4:80 */
This chain branches into two KUBE-SEP-* endpoints corresponding to the two Pods. Inspect the first endpoint:
sudo iptables -t nat -L KUBE-SEP-PQRSTU9876543210 -v
Result:
Chain KUBE-SEP-PQRSTU9876543210 (1 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- any any anywhere anywhere /* web/web-service */ tcp to:10.244.1.5:80
The chain terminates with a DNAT target, indicating that traffic matching this Service ultimately reaches Pod endpoint 10.244.1.5:80 and 10.244.2.4:80.
Tracing from PREROUTNIG and OUTPUT Chains
Examining PREROUTING:
sudo iptables -t nat -L PREROUTING -v
Output:
Chain PREROUTING (policy ACCEPT 5 packets, 300 bytes)
pkts bytes target prot opt in out source destination
99 5965 KUBE-SERVICES all -- any any anywhere anywhere /* kubernetes service portals */
All inbound traffic passes through KUBE-SERVICES. Checking OUTPUT:
sudo iptables -t nat -L OUTPUT -v
Output:
Chain OUTPUT (policy ACCEPT 3355 packets, 202K bytes)
pkts bytes target prot opt in out source destination
29961 1801K KUBE-SERVICES all -- any any anywhere anywhere /* kubernetes service portals */
Similarly, all outbound traffic routes through KUBE-SERVICES. Inspecting that chain:
sudo iptables -t nat -L KUBE-SERVICES -v
Output:
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-ABCD1234EFGH5678 tcp -- any any anywhere 10.107.33.105 /* web/web-service cluster IP */ tcp dpt:http
3391 203K KUBE-NODEPORTS all -- any any anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
Two branches exist: one for the Service's Cluster IP (KUBE-SVC-ABCD1234EFGH5678) and another for node ports (KUBE-NODEPORTS). Since we analyzed the first branch earlier, let's examine KUBE-NODEPORTS:
sudo iptables -t nat -L KUBE-NODEPORTS -v
Output:
Chain KUBE-NODEPORTS (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-EXT-ABCD1234EFGH5678 tcp -- any any anywhere anywhere /* web/web-service */ tcp dpt:30007
When traffic targets TCP port 30007, it matches KUBE-EXT-ABCD1234EFGH5678, which we already traced back to the same KUBE-SVC-* chain. Each newly created NodePort Service adds an entry to KUBE-NODEPORTS linking to a KUBE-EXT-* chain.
Rule Hierarchy Summary
All packets entering or leaving a node traverse KUBE-SERVICES. Routing decisions then depend on destination:
- Traffic destined for a Service's Cluster IP matches KUBE-SVC-* directly
- Other traffic reaches KUBE-NODEPORTS, which further matches based on TCP destination port, routing to KUBE-EXT-* chains
Both paths converge at KUBE-SVC-* chains. These chains use probabilistic distribution to spread traffic across KUBE-SEP-* endpoint chains, each representing a Pod. KUBE-SEP-* chains perform the final DNAT operation, redirecting original Service requests to actual Pod endpoints.
With two replicas, accessing the Service via NodePort or ClusterIP triggers this traversal sequence, achieving load-balanced distribution across backend Pods.