Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Kubernetes Pods: The Fundamental Building Block in Container Orchestration

Tech May 14 1

Why Do We Need Pods?

Throughout your journey with Kubernetes, you may have repeatedly asked yourself: why do we need Pods? We have invested significant effort in understanding the principles of Linux containers—the mantra of "Namespaces for isolation, Cgroups for resource limits, and rootfs for the filesystem" is now second nature. So why did the Kubernetes project introduce the concept of a Pod?

To answer this, let's revisit the core nature of a container: a container is, at its heart, a process. In the future cloud computing ecosystem, containers will be the processes, and container images are the executable installers. Kubernetes, then, is the operating system.

Consider the process tree on a typical Linux machine by running pstree -g. The output shows processes grouped together, like rsyslogd and its related kernel log module imklog within process group 1632. These processes collaborate to fulfill the duties of the rsyslogd program. (Note: the "processes" here are technically threads or lightweight processes in the Linux context, and "process groups" are thread groups—a historical naming nuance.)

Kubernetes maps this "process group" concept to container technology, making it a first-class citizen in its cloud operating system. Why? Because in real-world application deployments, there are often groups of containers that share a tightly collaborative, "super-close" relationship. They must be deployed on the same machine to work correctly.

The Gang Scheduling Problem

A classic example is rsyslogd, which comprises three modules: imklog, imuxsock, and the main function process. If containerized under Docker's single-process model, each module becomes a separate container, each with a memory quota of, say, 1 GB. (The single-process model doesn't limit a container to a single process, but rather means the container lacks the ability to manage multiple processes effectively, as the user application as PID=1 cannot perform process management like an init system.)

Now imagine a two-node cluster: node-1 has 3 GB of available memory, node-2 has 2.5 GB. To ensure all three containers run on the same machine using Docker Swarm, you'd set an affinity=main constraint on the imklog and imuxsock containers. However, during scheduling, main and imklog might be assigned to node-2, leaving only 0.5 GB—insufficient for imuxsock. The affinity constraint forces it to node-2 anyway, causing a scheduling failure. This is a classic gang scheduling problem, unsolved cleanly by resource hoarding (which introduces inefficiency and deadlock) or optimistic scheduling (which is complex to implement).

Kubernetes solves this elegantly: a Pod is the atomic unit of scheduling. The scheduler considers the aggregate resource requests of all containers in a Pod. For the rsyslogd example, the Pod requests 3 GB of memory, so it gets scheduled to node-1, respecting the need for co-location.

Super-Close Relationships and Pod Design

Containers with "super-close" relationships—such as sharing localhost communication, socket files, or Linux namespaces—naturally belong in the same Pod. Not all connected containers do; for example, a PHP app and a MySQL database communicate but don't need co-location, so they are better as separate Pods.

But why make Pods the first-class citizen, raising the learning curve? The deeper reason is the container design pattern.

Implementation: The Infra Container

A Pod is a logical concept, not a real isolation boundary. Kubernetes actually works with Linux namespaces and cgroups on the host. A Pod is implemented as a set of containers that share specific resources. All containers in a Pod share the same Network Namespace and can share Volumes.

The key to Pod implementation is an Infra container (using the k8s.gcr.io/pause image, a tiny ~100-200 KB image written in assembly that always stays "paused"). The Infra container is created first, and all other user containers join its Network Namespace via Docker's --network=container:<infra> option (or equivalent in other container runtimes).

This architecture means:

  • Containers communicate via localhost.
  • They see the same network devices as the Infra container.
  • The Pod has a single IP address (from the Network Namespace).
  • All network resources are shared across the Pod.
  • The Pod's lifecycle is tied to the Infra container, not to user containers.

Incoming and outgoing traffic effectively passes through the Infra container's network stack. For network plugin developers, this means you only need to configure the Pod's Network Namespace, not individual containers.

Volume sharing is similarly simplified: volumes are defined at the Pod level. A host directory is shared by declaring the volume in the Pod spec and mounting it in each container.

Container Design Patterns in Action

Thinking in terms of Pods encourages you to solve problems that are difficult with single containers alone.

Example 1: WAR File and Web Server

A Java WAR file needs to be deployed in Tomcat's webapps directory. With only Docker, options are limited:

  1. Build a custom Tomcat image with the WAR file embedded—inconvenient for updates.
  2. Mount a hostPath volume into the Tomcat container—requires managing the WAR file distribution on every node.

With a Pod, you can combine two containers:

  • An Init Container (from an image containing only the WAR file) that copies the WAR file to a shared volume.
  • The Tomcat container that mounts the same volume to its webapps directory.

Init containers run sequentially before user containers and exit. This pattern elegantly decouples the WAR file from the Tomcat image, enabling independent updates. This is the sidecar pattern, where a helper container performs tasks auxiliary to the main container.

Example 2: Log Collection

An application writes logs to /var/log. You can run a sidecar container that:

  1. Mounts the same volume to its own /var/log.
  2. Continuously reads and forwards logs to a centralized system like MongoDB or Elasticsearch.

This keeps the main application container focused on its task while the sidecar handles log management.

The shared Network Namespace also enables sidecars for network management (e.g., Istio's service mesh sidecars) without modifying the main container.

Key Takeaway

In the Kubernetes ecosystem, Pods serve as the analog to virtual machines in traditional infrastructures, while containers are the user programs runing within them.

This article is based on content from the Kubernetes in Action series.

Related Articles

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Overview In a recent project, I utilized the SBUS protocol with the Fus remote controller to control a vehicle's basic operations, including movement, lights, and mode switching. This article is aimed...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.