Chapter 3.1 · Pods — The Atomic Unit of Kubernetes

Module 3: Pods, Workloads, and Scheduling

The fundamental building blocks of Kubernetes applications — understanding pods, controllers, and how Kubernetes decides where to run your workloads.

Module 3 of 8 | Difficulty: Intermediate

You have stood up your cluster and explored its architecture. Now for the foundational question: what is the smallest thing Kubernetes schedules onto a node? Not a container — a Pod. Think of a container as a fully furnished room. But a room cannot float in space. It needs an apartment around it: walls, utilities, a shared address. The Pod is that apartment — the smallest unit Kubernetes creates, places, and manages. Everything else in the ecosystem wraps around Pods or connects to them. Master the Pod, and the rest follows.

3.1.1 Why Pods, Not Containers?

Analogy: An Apartment, Not Just a Room

You do not rent a standalone room in a field — you rent an apartment. The apartment gives you a street address, shared plumbing and electricity, walls, and roommates who share those utilities. The room (container) is where you live, but the apartment (Pod) is the smallest independently assignable unit.

A Kubernetes Pod works identically. It is the smallest deployable unit — a wrapper around one or more containers sharing a network namespace (one IP), storage volumes, and IPC. You cannot deploy a bare container; you always deploy a Pod, even with exactly one container.

Visual Description:

graph TD subgraph "Pod [Apartment 3B] — Single IP: 10.244.1.5" direction LR C1[Container: nginx Room 1] C2[Container: log-shipper Room 2] V1[shared volume emptyDir] end C1 <-->|localhost + shared IPC| C2 C1 --> V1 C2 --> V1 style C1 fill:#90caf9,stroke:#1565c0 style C2 fill:#a5d6a7,stroke:#2e7d32 style V1 fill:#ffcc80,stroke:#ef6c00

The diagram shows two containers sharing IP 10.244.1.5, reachable via localhost, and reading from the same emptyDir volume. This is composition over inheritance: Kubernetes prefers composing small, focused containers over building monolithic ones.

The shared network namespace is the most important property. Every container sees the same localhost. If nginx listens on port 80 and a sidecar connects to localhost:80, traffic stays entirely within the Pod — roommates talking across the living room, not neighbors mailing letters. Storage volumes are likewise shared: a file written by one container is instantly visible to another at the same mount path.

⚠️ Common Misconception: Do not cram multiple applications into a single Pod just because you can. Containers in the same Pod are tightly coupled — always scheduled together, started and stopped together, on the same node. If two services can scale independently, they belong in separate Pods. The rule: a Pod should contain only containers that must share localhost and local disk.

🛑 PAUSE & RECALL — 2 minutes

Without looking back:

Why does Kubernetes use Pods instead of scheduling containers directly?
What three resources do containers within a Pod share?
Should a web application and its database go in the same Pod? Why or why not?

Rate your confidence (0–4) before continuing.

3.1.2 Multi-Container Pod Patterns

With the apartment model in mind, let us explore how roommates organize their shared space.

Sidecar Pattern: Concurrent Helper

A sidecar runs alongside the main container for the Pod's entire lifecycle — log shippers, monitoring exporters, TLS proxies. Here is nginx with a sidecar that writes timestamps to a shared file:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-sidecar
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80
      volumeMounts:
        - name: shared-logs
          mountPath: /var/log/nginx
    - name: log-toucher
      image: busybox:1.36
      command: ["/bin/sh", "-c"]
      args:
        - while true; do
            date >> /var/log/nginx/timestamp.log;
            sleep 10;
          done
      volumeMounts:
        - name: shared-logs
          mountPath: /var/log/nginx
  volumes:
    - name: shared-logs
      emptyDir: {}

Both containers mount the same emptyDir volume. Nginx writes access logs; the sidecar reads and augments them.

Init Container Pattern: Sequential Setup

Init containers run to completion before app containers start. Multiple init containers execute sequentially. They are the moving crew that delivers furniture before you move in:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-init
spec:
  initContainers:
    - name: wait-for-backend
      image: busybox:1.36
      command:
        - sh
        - -c
        - |
          until nc -z backend-svc 8080; do
            echo "Waiting for backend-svc:8080..."; sleep 2
          done
  containers:
    - name: myapp
      image: myapp:1.0
      ports:
        - containerPort: 3000

The init container probes backend-svc:8080. Only when it responds does the container exit, allowing myapp to start.

Pattern Summary

Pattern	Timing	Use Case
Sidecar	Concurrent	Log shipping, monitoring, TLS termination
Init	Before app starts	Database migrations, waiting for dependencies
Ambassador	Concurrent	Proxying external access, adding retry logic
Adapter	Concurrent	Normalizing log formats, protocol translation

The decision rule: if containers need shared localhost and local disk, and must scale and die together, they belong in one Pod. Otherwise, separate Pods connected by Services.

3.1.3 Pod Lifecycle and Status

Visual Description:

stateDiagram-v2 [*] --> Pending: Pod created Pending --> Running: Container(s) started Running --> Succeeded: All exit 0 Running --> Failed: Non-zero exit Running --> Unknown: Node lost Succeeded --> [*]: Cleaned up Failed --> [*]: Remains for inspection

Pod Phases

Phase	Meaning
Pending	Accepted by cluster but not yet running — still being scheduled or pulling images.
Running	Bound to a node; at least one container is running.
Succeeded	All containers terminated successfully. Typical for Jobs.
Failed	All containers terminated; at least one exited non-zero.
Unknown	Cluster cannot determine state — usually the node is unreachable.

Container States and Conditions

Each container has its own state: Waiting (preparing, pulling image), Running (executing), or Terminated (finished, with exit code and reason). Pod conditions provide finer detail: PodScheduled (assigned to a node), Initialized (all init containers done), ContainersReady (all readiness probes passed), and Ready (the Pod can serve requests — this is what Services check).

Restart Policies

Policy	Behavior	Use Case
Always	Restart regardless of exit code	Long-running services (default)
OnFailure	Restart only on non-zero exit	Jobs, batch workloads
Never	Never restart	Debugging, one-off tasks

⚠️ Common Misconception: restartPolicy: Always does not mean your container is immortal. It means the kubelet restarts the process if it exits. Crash loops trigger exponential back-off delays, up to five minutes between attempts.

3.1.4 Resource Management: Requests vs. Limits

Kubernetes needs you to declare what your container consumes. Two distinct concepts drive very different behaviors.

Requests are used by the Scheduler to find a suitable node. They represent a guaranteed minimum — a reservation. Limits are enforced by the container runtime. They are a hard ceiling.

resources:
  requests:
    cpu: "100m"       # 0.1 cores — guaranteed minimum
    memory: "128Mi"   # 128 MiB — guaranteed minimum
  limits:
    cpu: "500m"       # 0.5 cores — maximum allowed
    memory: "256Mi"   # 256 MiB — maximum allowed

This container is guaranteed 0.1 CPU and 128Mi memory. On a node with spare capacity, CPU may burst to 0.5 cores. But exceed 256Mi memory, and the container is OOMKilled immediately.

Resource	Compressible?	Exceeding Limit
CPU	Yes	Throttled — runs slower, not killed
Memory	No	OOMKilled — terminated immediately

Best Practice: Requests Equal Limits

For predictable production workloads, set requests == limits. When they match, Kubernetes reserves exact capacity — no surprises, no throttling, no sudden kills. Your Pod receives the Guaranteed QoS class, and the Scheduler places it on a node with genuinely reserved resources. This is the single most important resource-management decision for production stability.

🤔 TRY BEFORE YOU SEE

Your container typically uses 200m CPU and 512Mi memory at steady state, but spikes to 1 CPU and 1Gi memory. Under memory pressure, would you prefer throttling or killing? Given memory is incompressible, what is the safest configuration for a critical production service?

Write your answer before reading the reveal.

Reveal: Exceeding a memory limit always triggers OOMKill — an abrupt termination dropping in-flight requests. For critical services, set requests == limits for memory (e.g., memory: "1Gi" for both) to earn Guaranteed QoS. For CPU, requests: "200m" with limits: "1000m" is acceptable if occasional throttling is tolerable. Never set memory limits below genuine peak usage — that causes crash loops.

3.1.5 Quality of Service Classes

Kubernetes silently assigns every Pod a QoS class based on resource specs. This class becomes critical when a node exhausts resources — it determines which Pods die first.

Visual Description:

graph TD A[Pod Resource Spec] --> B{Requests and limits set for ALL containers?} B -->|Yes| C{Requests == limits for ALL containers?} C -->|Yes| D["Guaranteed Highest Priority"] C -->|No| E["Burstable Middle Priority"] B -->|No| F["BestEffort Lowest Priority Evicted FIRST"] style D fill:#4caf50,stroke:#2e7d32,color:#fff style E fill:#ff9800,stroke:#ef6c00,color:#fff style F fill:#f44336,stroke:#c62828,color:#fff

QoS Class	Requirement	Eviction Order	Best For
Guaranteed	Every container: `requests == limits` for CPU and memory	Last (safest)	Production workloads, databases
Burstable	Some mismatch or partial specification	After BestEffort	Development, batch jobs
BestEffort	No resources specified at all	First (sacrificed)	Debug shells, trivial pods

When memory pressure hits, the kubelet evicts in order: BestEffort → Burstable → Guaranteed. Guaranteed Pods are the critical patients who keep their beds; BestEffort Pods are asked to leave first. Leaving Pods as BestEffort in production is dangerous — they become the first casualties whenever a node faces pressure.

⚠️ Common Misconception: QoS class also matters for Pod preemption. When the Scheduler needs to evict running Pods to schedule a higher-priority pending Pod, Guaranteed workloads are far more defensible than Burstable or BestEffort.

🛑 PAUSE & RECALL — 2 minutes

Without looking back:

What is the difference between a resource request and a limit? Which does the Scheduler use, and which does the runtime enforce?
What happens when a container exceeds its CPU limit? Its memory limit?
Name the three QoS classes in eviction-priority order.
What configuration gives a Pod the Guaranteed QoS class?

Rate your confidence (0–4) before continuing.

GKE in Practice: Pod Behavior on Google Kubernetes Engine

GKE Note: GKE Autopilot and Standard treat Pods differently — and Autopilot's enforcement surprises many beginners.

Autopilot Enforces Resources. Deploy a Pod without resource requests and limits for every container, and Autopilot rejects it. BestEffort and improperly specified Pods are blocked outright. You write complete resource stanzas from day one.

Node-Level Agents. Every GKE node runs fluentbit-gke (logs) and gke-metrics-agent (metrics) as DaemonSets — the building maintenance staff on every floor. Your container logs are scraped and forwarded to Cloud Logging automatically; no sidecar needed for basic log shipping.

Cost Allocation Tags. GKE supports GCP cost allocation tags via Kubernetes labels that flow to billing exports. Tag Pods consistently from the start — retroactive attribution is painful.

Lab: LAB-3.1 — Mastering Pods (60 minutes)

Build a multi-container Pod, observe containers sharing resources, deploy an init container, inspect per-container logs, and verify QoS class assignment.

Step 1: Create a Multi-Container Pod with Shared Storage

Save as multi-container-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-with-sidecar
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "100m"
          memory: "128Mi"
      volumeMounts:
        - name: shared-data
          mountPath: /usr/share/nginx/html
    - name: content-writer
      image: busybox:1.36
      command: ["/bin/sh", "-c"]
      args:
        - while true; do
            echo "<h1>Updated at $(date)</h1>" > /shared/index.html;
            sleep 5;
          done
      resources:
        requests:
          cpu: "50m"
          memory: "64Mi"
        limits:
          cpu: "50m"
          memory: "64Mi"
      volumeMounts:
        - name: shared-data
          mountPath: /shared
  volumes:
    - name: shared-data
      emptyDir: {}

kubectl apply -f multi-container-pod.yaml

Step 2: Verify the Sidecar Pattern

kubectl port-forward pod/nginx-with-sidecar 8080:80 &
curl http://localhost:8080
curl http://localhost:8080

Each curl returns a different timestamp — the content-writer sidecar writes to the shared volume, and nginx serves it without knowing who wrote it.

Step 3: Deploy a Pod with an Init Container

Save as init-container-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-init
spec:
  initContainers:
    - name: init-wait
      image: busybox:1.36
      command: ["sh", "-c", "echo 'Init starting...'; sleep 10; echo 'Init complete!'"]
  containers:
    - name: main-app
      image: nginx:1.25
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "100m"
          memory: "128Mi"

kubectl apply -f init-container-pod.yaml
kubectl get pod app-with-init -w

Watch Init:0/1 → PodInitializing → Running. The main container starts only after the init container finishes.

Step 4: Container-Specific Logs

kubectl logs nginx-with-sidecar -c nginx
kubectl logs nginx-with-sidecar -c content-writer
kubectl logs app-with-init -c init-wait

The -c flag targets a specific container. Without it, kubectl logs errors in multi-container Pods.

Step 5: Inspect QoS Class

kubectl get pod nginx-with-sidecar -o jsonpath='{.status.qosClass}'

Expected: Guaranteed. Compare with BestEffort:

kubectl run best-effort-pod --image=nginx:1.25
kubectl get pod best-effort-pod -o jsonpath='{.status.qosClass}'

Expected: BestEffort — no resources specified, first to be evicted under pressure.

Cleanup

kubectl delete -f multi-container-pod.yaml
kubectl delete -f init-container-pod.yaml
kubectl delete pod best-effort-pod

Chapter Summary

A Pod is Kubernetes' atomic unit — a wrapper around one or more containers sharing a network namespace (single IP), storage volumes, and IPC. Multi-container Pods follow established patterns: sidecars run concurrently to enhance the main app, init containers perform sequential setup before the app starts. Pods progress through phases (Pending → Running → Succeeded/Failed) with each container reporting Waiting, Running, or Terminated states. Resource requests drive scheduling; limits enforce hard ceilings — CPU is throttled when exceeded, memory causes OOMKill. The QoS class (Guaranteed, Burstable, BestEffort) derives from resource specifications and determines eviction priority under node pressure. On GKE Autopilot, proper resource specifications are enforced at the API level.

📇 KEY CONCEPT CARDS

Q: What is a Pod, and why is it the smallest deployable unit rather than a container?
A: A Pod hosts one or more containers that share a network namespace (single IP), storage volumes, and IPC. Containers within a Pod are tightly coupled — scheduled together, started together, sharing local resources.

Q: What is the difference between a sidecar and an init container?
A: A sidecar runs concurrently with the main container for the Pod's entire lifecycle (e.g., log shipping). An init container runs to completion before any app container starts; multiple init containers execute sequentially (e.g., waiting for dependencies).

Q: What happens when a container exceeds its CPU limit versus its memory limit?
A: CPU is compressible — exceeding the limit causes throttling. Memory is incompressible — exceeding the limit triggers immediate OOMKill by the Linux kernel.

Q: How is QoS class determined, and what does it affect?
A: Guaranteed requires requests == limits for all containers and all resources. Burstable means some mismatch or partial specification. BestEffort means no resources set. QoS determines eviction priority when nodes face resource pressure: BestEffort first, Burstable second, Guaranteed last.