Kubernetes Pods: The Fundamental Building Block in Container Orchestration
Why Do We Need Pods?
Throughout your journey with Kubernetes, you may have repeatedly asked yourself: why do we need Pods? We have invested significant effort in understanding the principles of Linux containers—the mantra of "Namespaces for isolation, Cgroups for resource limits, and rootfs for the filesystem" is now second nature. So why did the Kubernetes project introduce the concept of a Pod?
To answer this, let's revisit the core nature of a container: a container is, at its heart, a process. In the future cloud computing ecosystem, containers will be the processes, and container images are the executable installers. Kubernetes, then, is the operating system.
Consider the process tree on a typical Linux machine by running pstree -g. The output shows processes grouped together, like rsyslogd and its related kernel log module imklog within process group 1632. These processes collaborate to fulfill the duties of the rsyslogd program. (Note: the "processes" here are technically threads or lightweight processes in the Linux context, and "process groups" are thread groups—a historical naming nuance.)
Kubernetes maps this "process group" concept to container technology, making it a first-class citizen in its cloud operating system. Why? Because in real-world application deployments, there are often groups of containers that share a tightly collaborative, "super-close" relationship. They must be deployed on the same machine to work correctly.
The Gang Scheduling Problem
A classic example is rsyslogd, which comprises three modules: imklog, imuxsock, and the main function process. If containerized under Docker's single-process model, each module becomes a separate container, each with a memory quota of, say, 1 GB. (The single-process model doesn't limit a container to a single process, but rather means the container lacks the ability to manage multiple processes effectively, as the user application as PID=1 cannot perform process management like an init system.)
Now imagine a two-node cluster: node-1 has 3 GB of available memory, node-2 has 2.5 GB. To ensure all three containers run on the same machine using Docker Swarm, you'd set an affinity=main constraint on the imklog and imuxsock containers. However, during scheduling, main and imklog might be assigned to node-2, leaving only 0.5 GB—insufficient for imuxsock. The affinity constraint forces it to node-2 anyway, causing a scheduling failure. This is a classic gang scheduling problem, unsolved cleanly by resource hoarding (which introduces inefficiency and deadlock) or optimistic scheduling (which is complex to implement).
Kubernetes solves this elegantly: a Pod is the atomic unit of scheduling. The scheduler considers the aggregate resource requests of all containers in a Pod. For the rsyslogd example, the Pod requests 3 GB of memory, so it gets scheduled to node-1, respecting the need for co-location.
Super-Close Relationships and Pod Design
Containers with "super-close" relationships—such as sharing localhost communication, socket files, or Linux namespaces—naturally belong in the same Pod. Not all connected containers do; for example, a PHP app and a MySQL database communicate but don't need co-location, so they are better as separate Pods.
But why make Pods the first-class citizen, raising the learning curve? The deeper reason is the container design pattern.
Implementation: The Infra Container
A Pod is a logical concept, not a real isolation boundary. Kubernetes actually works with Linux namespaces and cgroups on the host. A Pod is implemented as a set of containers that share specific resources. All containers in a Pod share the same Network Namespace and can share Volumes.
The key to Pod implementation is an Infra container (using the k8s.gcr.io/pause image, a tiny ~100-200 KB image written in assembly that always stays "paused"). The Infra container is created first, and all other user containers join its Network Namespace via Docker's --network=container:<infra> option (or equivalent in other container runtimes).
This architecture means:
- Containers communicate via
localhost. - They see the same network devices as the Infra container.
- The Pod has a single IP address (from the Network Namespace).
- All network resources are shared across the Pod.
- The Pod's lifecycle is tied to the Infra container, not to user containers.
Incoming and outgoing traffic effectively passes through the Infra container's network stack. For network plugin developers, this means you only need to configure the Pod's Network Namespace, not individual containers.
Volume sharing is similarly simplified: volumes are defined at the Pod level. A host directory is shared by declaring the volume in the Pod spec and mounting it in each container.
Container Design Patterns in Action
Thinking in terms of Pods encourages you to solve problems that are difficult with single containers alone.
Example 1: WAR File and Web Server
A Java WAR file needs to be deployed in Tomcat's webapps directory. With only Docker, options are limited:
- Build a custom Tomcat image with the WAR file embedded—inconvenient for updates.
- Mount a hostPath volume into the Tomcat container—requires managing the WAR file distribution on every node.
With a Pod, you can combine two containers:
- An Init Container (from an image containing only the WAR file) that copies the WAR file to a shared volume.
- The Tomcat container that mounts the same volume to its webapps directory.
Init containers run sequentially before user containers and exit. This pattern elegantly decouples the WAR file from the Tomcat image, enabling independent updates. This is the sidecar pattern, where a helper container performs tasks auxiliary to the main container.
Example 2: Log Collection
An application writes logs to /var/log. You can run a sidecar container that:
- Mounts the same volume to its own
/var/log. - Continuously reads and forwards logs to a centralized system like MongoDB or Elasticsearch.
This keeps the main application container focused on its task while the sidecar handles log management.
The shared Network Namespace also enables sidecars for network management (e.g., Istio's service mesh sidecars) without modifying the main container.
Key Takeaway
In the Kubernetes ecosystem, Pods serve as the analog to virtual machines in traditional infrastructures, while containers are the user programs runing within them.
This article is based on content from the Kubernetes in Action series.