Chapter 2.2 · Kubernetes Architecture: Control Plane and Worker Nodes

In the previous chapter, you created your first GKE cluster and watched Kubernetes bring Pods to life. But what actually happened inside? How did Kubernetes know where to run your Pods? How does it keep them running when things go wrong? And what is GKE doing behind the scenes that you can't see?

The answer lies in Kubernetes architecture — the engine room. This chapter is where you stop being a user and start becoming an administrator. Copy-pasters deploy YAML; good administrators understand the machinery.

Analogy: The Modern Restaurant Kitchen

Imagine a high-end restaurant during dinner rush. Orders pour in, dishes must meet exact specifications, and every station coordinates seamlessly. Kubernetes works like that kitchen: a kitchen management team (the Control Plane) makes decisions and tracks orders, while cooking stations (the Worker Nodes) do the actual preparation. Each Kubernetes component maps to a kitchen role throughout this chapter, giving you a tangible mental model for every technical detail.

2.2.1 The Control Plane — Kubernetes' Brain

The Control Plane makes every significant decision: what runs where, what should be running, and the current state of the entire system. In a restaurant, this is the kitchen management team who coordinate from a central position.

The Control Plane consists of four core components on dedicated master nodes:

API Server — the front door for all communication
etcd — the source of truth for all cluster data
Scheduler — the placement engine
Controller Manager — the eternal watchdog

There's also a Cloud Controller Manager on cloud-hosted clusters for cloud-provider integrations like load balancers.

The critical principle is separation of control and data planes. The Control Plane runs only cluster-management software — never user workloads. This ensures that even when your applications misbehave, the cluster's ability to manage itself remains intact. It's why a restaurant's management office is separate from the cooking floor: a grease fire at one station shouldn't shut down the ordering system.

For high availability, production clusters run the Control Plane across multiple masters (typically three or five). If one master fails, the others continue.

GKE Note: On GKE, the Control Plane is entirely managed by Google. You don't see master nodes, manage etcd, or SSH into the API Server. Google runs it across multiple zones by default with automatic upgrades. You're buying a managed brain, not assembling one.

2.2.2 The API Server — The Front Door

Analogy: The Expediter Window

In a kitchen, every communication between the dining room and cooking stations passes through the expediter window. Nothing goes directly from a server to a line cook. The API Server is that window — every cluster interaction, whether from kubectl, a controller, the Scheduler, or a kubelet, flows through it.

The API Server is the only component that talks directly to etcd. When the Scheduler picks a Node, it tells the API Server, which validates and persists the decision. When a kubelet reports status, the API Server updates etcd. This centralized gatekeeping ensures consistency and enables the authentication, authorization, and admission control layers that protect your cluster.

Every Kubernetes object is exposed through a RESTful API where each type is a resource. A Deployment lives at /apis/apps/v1/namespaces/{namespace}/deployments. When you run kubectl apply, kubectl converts your YAML into a JSON HTTP POST to the API Server.

Every request passes through three layers before reaching etcd: Authentication (who are you?), Authorization (what can you do?), and Admission Control (should this be allowed or modified?). The aggregation layer also extends the API with custom resources, enabling tools like cert-manager to add their own endpoints.

⚠️ Common Misconception: "The API Server is just a REST API gateway." It handles watch notifications, schema validation, resource versioning, and the entire request pipeline. It's the beating heart of the cluster.

2.2.3 etcd — The Source of Truth

Analogy: The Order Ticket Rail

In a restaurant, every active order is written on a ticket pinned to a central rail. Everyone — cooks, expediters, managers — can look at this rail and know exactly what's happening. If it were lost, the kitchen would have no idea what anyone ordered. etcd is that ticket rail — the single, authoritative record of everything in your cluster.

etcd is a distributed, consistent key-value store holding all cluster state: every Pod spec, Service definition, Secret, and Node status. What makes etcd special is the Raft consensus algorithm. In a multi-master cluster, etcd runs as three or five members, and every write must be acknowledged by a majority (a "quorum"). A three-node cluster tolerates one failure; a five-node cluster tolerates two.

Why is etcd backup the most critical operational task? Because everything is in it. Lose etcd without a backup and you lose all cluster state. Running Pods may continue, but Kubernetes has no record of what should exist. It's like losing the order ticket rail during dinner service: food might still be cooking, but nobody knows what's supposed to be happening.

GKE Note: You never interact with etcd on GKE. Google manages backups, scaling, and recovery — a significant advantage, as etcd administration is among the most error-prone aspects of self-managing Kubernetes.

🛑 PAUSE & RECALL — 3 minutes

Without looking back:

Why is the API Server the only component that talks directly to etcd?
In the restaurant analogy, what does etcd represent? What happens if it disappears?
A three-node etcd cluster tolerates how many failures? A five-node cluster?

Rate your confidence (0–4).

2.2.4 The Scheduler — The Placement Engine

Analogy: The Station Assigner

In a kitchen, someone decides which cook handles which order. The pasta station shouldn't get dessert. The Scheduler is that station assigner — it watches for unassigned Pods and finds the best Node to run them.

The Scheduler uses a two-phase algorithm: filter, then score.

Phase 1: Filter eliminates Nodes that cannot run the Pod: not enough CPU or memory? Doesn't match nodeSelector? Can't tolerate the Pod's taints? Anti-affinity rules blocking placement? If no Nodes pass, the Pod stays Pending.

Phase 2: Score ranks remaining Nodes by desirability, considering resource balance, affinity preferences, topology spread, and custom priorities. The highest-scoring Node wins.

You can customize scheduling with labels/nodeSelector, node affinity/anti-affinity, taints and tolerations (to repel Pods from specific Nodes), and pod affinity/anti-affinity (to co-locate or spread Pods).

GKE Note: GKE extends scheduling with node auto-provisioning. When no suitable Node exists, GKE can automatically create a new node pool. The Scheduler's concept of "placement" expands from "which existing Node" to "what infrastructure should exist."

⚠️ Common Misconception: "The Scheduler picks Nodes randomly." Scheduling is a sophisticated multi-step optimization. Pods stuck in Pending are due to filter-phase failures, not randomness.

2.2.5 The Controller Manager — The Eternal Watchdog

Analogy: The Quality Control Team

In a restaurant, a quality team continuously walks the floor, noticing when a dish doesn't match its order or a station falls behind. Their job is never done. The Controller Manager is that team, and its fundamental mechanism is the control loop.

The Control Loop — Kubernetes' Most Important Pattern

Every controller follows this cycle:

Observe current state (read from the API Server)
Compare with desired state (the resource spec)
Act to reduce the gap
Repeat indefinitely

Controllers don't just respond to changes — they continuously reconcile. Even when nothing changes, they periodically verify reality matches expectations.

Visual Description:

graph LR A[Observe current state] --> B[Compare to desired state] B --> C{Match?} C -->|No| D[Take action via API Server] D --> E[State updated in etcd] E --> A C -->|Yes| F[Wait & repeat] F --> A

The Controller Manager runs many controllers, each watching a specific resource type:

Controller	Function
Deployment	Creates and manages ReplicaSets
ReplicaSet	Ensures correct Pod count exists
StatefulSet	Manages stateful apps with stable identity
Node	Monitors Node health, evicts Pods when nodes fail
EndpointSlice	Maintains Pod IP lists for Services
Job/CronJob	Creates Pods that run to completion or on a schedule

Each operates independently, watching the API Server and taking action. If a controller crashes, it restarts and picks up where it left off — its state is entirely in etcd.

🤔 TRY BEFORE YOU SEE

You create a Deployment with replicas: 3. Later, you SSH to a worker node and manually delete one container using crictl rm.

Predict what happens step by step. Who notices? What triggers replacement? How long does it take?

Reveal: (1) The kubelet notices the container is gone. (2) It reports the status to the API Server, which writes to etcd. (3) The ReplicaSet Controller sees only 2 Pods when it expects 3. (4) It creates a new Pod spec via the API Server. (5) The Scheduler assigns the new Pod to a Node. (6) The target Node's kubelet starts a replacement container. Total time: 10–30 seconds. The system self-healed because actual state didn't match desired state — the magic of the control loop.

2.2.6 Worker Node Components — Where the Work Happens

Analogy: The Cooking Stations

If the Control Plane is the kitchen management team, Worker Nodes are the cooking stations. Every Worker Node runs three essential components.

kubelet — The Node Agent

The kubelet is the cook at each station. It registers the Node with the cluster, receives Pod specs from the API Server, and ensures containers are running and healthy via the container runtime. It reports Node and Pod status back to the API Server. If a kubelet stops responding, the Node Controller marks the Node NotReady after 40 seconds and begins Pod eviction.

kube-proxy — The Network Rule Manager

The kube-proxy is the food runner who knows all delivery routes. It maintains network rules that implement Kubernetes Services. When you create a ClusterIP Service, kube-proxy configures iptables or IPVS rules so traffic to the Service's virtual IP is load-balanced to backing Pods. Importantly, kube-proxy is not a proxy that traffic flows through — it's a rule manager that configures the kernel's packet filtering. Traffic goes directly from source to destination Pod.

⚠️ Common Misconception: "kube-proxy is an HTTP proxy like nginx." It is not. kube-proxy runs as a DaemonSet and programs the kernel's routing tables. No traffic passes through the kube-proxy process itself.

Container Runtime — The Equipment

The container runtime is the kitchen equipment — ovens, grills, mixers. Kubernetes supports any runtime implementing the Container Runtime Interface (CRI). On GKE, this is containerd, a lightweight runtime that pulls images, creates containers using Linux namespaces and cgroups, and manages their lifecycle. GKE uses containerd, not Docker. Docker is a development tool; containerd is purpose-built for orchestration.

CNI Plugins — The Plumbing

CNI (Container Network Interface) plugins assign Pod IP addresses and ensure every Pod can reach every other Pod. On GKE, CNI uses VPC-native networking, giving each Pod a real IP from your VPC subnet.

graph TD subgraph "Control Plane [Kitchen Management]" API[API Server Expediter Window] ETCD[etcd Order Ticket Rail] SCHED[Scheduler Station Assigner] CTRL[Controller Manager Quality Control] end subgraph "Worker Node A [Station A]" KUBE1[kubelet Cook] PROXY1[kube-proxy Runner] RUN1[containerd Equipment] POD1A[Pod] POD1B[Pod] end subgraph "Worker Node B [Station B]" KUBE2[kubelet Cook] PROXY2[kube-proxy Runner] RUN2[containerd Equipment] POD2A[Pod] end API -->|Pod specs| KUBE1 API -->|Pod specs| KUBE2 KUBE1 -->|Status| API KUBE2 -->|Status| API API --> ETCD CTRL -->|Watch/Update| API SCHED -->|Bind| API KUBE1 -->|Run| RUN1 KUBE2 -->|Run| RUN2 style API fill:#ffcc80 style ETCD fill:#ce93d8 style SCHED fill:#90caf9 style CTRL fill:#ef9a9a style KUBE1 fill:#a5d6a7 style KUBE2 fill:#a5d6a7

2.2.7 GKE Architecture Specifics

Analogy: A Restaurant with an Invisible Management Team

On GKE, Google manages the kitchen management team — invisible but always working. You only interact with the expediter window (the API Server endpoint).

What Google Manages vs. What You Manage

Component	Standard GKE	Autopilot
Control Plane (API Server, etcd, Scheduler, Controller Manager)	Google-managed	Google-managed
Worker Node OS and container runtime	Google-managed	Google-managed
Worker Node provisioning and scaling	You configure node pools	Fully automatic
CNI networking	Google-managed (VPC-native)	Google-managed
Control plane upgrades	Google-managed	Google-managed
Add-ons (CoreDNS, metrics-server)	Google-managed	Google-managed

Key GKE Architecture Features

VPC-native networking: Pods get real IPs from your VPC subnet — no overlay, no performance penalty.
Multi-zonal HA by default: Control plane spans three zones automatically.
Control plane logs via Cloud Logging: Stream API Server audit logs for security monitoring.
Master authorized networks: Restrict which IP ranges can reach the API Server.
Private clusters: The API Server endpoint is only accessible within your VPC.

graph TD subgraph "Google-Managed Control Plane" API[API Server Endpoint] end subgraph "Your GCP Project" subgraph "VPC Network" subgraph "GKE Subnet with Alias IPs" NP1[Node Pool A] NP2[Node Pool B] PODS[Pods with VPC IPs] end end end USER[You via kubectl] -->|HTTPS| API API -->|Specs| NP1 API -->|Specs| NP2 NP1 --> PODS NP2 --> PODS style API fill:#f44336 style NP1 fill:#a5d6a7 style NP2 fill:#a5d6a7 style PODS fill:#fff9c4

GKE Note: kubectl get nodes shows your Workers, but kubectl get pods -n kube-system won't show API Server or etcd — they exist in Google's infrastructure. This is both a convenience (no operational burden) and a constraint (no direct control over control plane configuration).

🛑 PAUSE & RECALL — 3 minutes

Without looking back:

Name all four Control Plane components and their kitchen roles.
What are the three components on every Worker Node?
On GKE, what do you manage and what does Google manage?

Rate your confidence (0–4).

2.2.8 Tracing a Deployment Through the System

When you run kubectl apply -f deployment.yaml, here's the flow:

sequenceDiagram participant U as You (kubectl) participant API as API Server participant E as etcd participant CTRL as Controller Manager participant S as Scheduler participant K as kubelet participant R as containerd U->>API: POST Deployment YAML API->>API: Authenticate, validate API->>E: Write Deployment spec CTRL->>API: Watch: new Deployment CTRL->>API: Create ReplicaSet (replicas: 3) API->>E: Write ReplicaSet spec CTRL->>API: Create 3 Pod objects API->>E: Write Pod specs (Pending) S->>API: Watch: unassigned Pods S->>S: Filter nodes, score nodes S->>API: Bind Pod to Node X API->>E: Update Pod with nodeName K->>API: Watch: Pod assigned to me K->>R: Create container R-->>K: Container Running K->>API: Report status: Running API->>E: Update Pod status

Every step goes through the API Server. No component talks directly to another. etcd writes are sequential and consistent. The control loops ensure that even if a component crashes mid-flow, the system converges to the desired state upon recovery.

Lab: LAB-2.2 — Exploring Kubernetes Architecture (60 min)

Step 1: Examine Your Nodes (10 min)

kubectl get nodes -o wide
kubectl describe node $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')

Look for: Capacity vs. Allocatable (what Pods can use after system reservations), Conditions (Ready, MemoryPressure), and the container runtime version.

Step 2: View System Pods (10 min)

kubectl get pods -n kube-system -o wide

Look for kube-dns/coredns, fluentbit-gke, and gke-metrics-agent. Notice: no API Server, etcd, Scheduler, or Controller Manager pods — these are Google-managed.

Step 3: Access the Raw REST API (15 min)

kubectl proxy &
curl http://localhost:8001/api/
curl http://localhost:8001/api/v1/pods
kill %1

Everything in Kubernetes is a REST resource with a URL — this is what kubectl talks to under the hood.

Step 4: View Control Plane Logs (10 min)

gcloud logging read "protoPayload.serviceName=\"container.googleapis.com\"" --limit=5 --format="table(timestamp,protoPayload.methodName)"

These logs show who created what resources and when — essential for security auditing.

Step 5: Deploy and Observe Self-Healing — The "Aha" Moment (15 min)

cat << 'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-demo
  template:
    metadata:
      labels:
        app: nginx-demo
    spec:
      containers:
      - name: nginx
        image: nginx:1.25
        resources:
          requests:
            cpu: "50m"
            memory: "64Mi"
EOF

kubectl wait --for=condition=available --timeout=60s deployment/nginx-demo
POD=$(kubectl get pods -l app=nginx-demo -o jsonpath='{.items[0].metadata.name}')
kubectl delete pod $POD --wait=false
kubectl get pods -l app=nginx-demo -w

What you should see: The deleted pod enters Terminating. Within seconds, a new pod appears with a different name. The ReplicaSet Controller noticed actual state (2 pods) didn't match desired state (3) and created a replacement — the control loop in action.

Press Ctrl+C, then clean up: kubectl delete deployment nginx-demo.

Chapter Summary

The Control Plane — API Server, etcd, Scheduler, and Controller Manager — is the cluster's brain. The API Server is the exclusive front door; etcd is the single source of truth; the Scheduler decides placement through filter-then-score; and the Controller Manager's control loops make Kubernetes self-healing. Worker Nodes execute workloads through kubelet, kube-proxy, and containerd.

On GKE, Google manages the entire Control Plane, giving you multi-zonal HA without operational burden. VPC-native networking, node auto-provisioning, control plane logs, and master authorized networks extend core Kubernetes with cloud-native capabilities.

The deepest insight is the control loop: Kubernetes continuously reconciles actual state with desired state. This is why deleting a pod triggers automatic replacement, why scaling is declarative, and why the system is resilient. Understanding this pattern separates administrators who can troubleshoot from those who can only copy-paste.

📇 KEY CONCEPT CARDS

Q: What are the four Control Plane components and their roles?
A: (1) API Server — front door for all communication, exclusive etcd access; (2) etcd — distributed key-value store for all cluster state, uses Raft consensus; (3) Scheduler — filter-then-score algorithm for Pod-to-Node placement; (4) Controller Manager — runs control loops that reconcile actual state with desired state.

Q: Why is etcd the most critical component to protect?
A: etcd contains the entire desired state — every Pod, Service, Deployment, Secret. Losing it without a backup means losing all cluster configuration. Running Pods may continue, but Kubernetes cannot recover or reconcile.

Q: What is the control loop pattern, and why is it fundamental?
A: A continuous cycle: observe current state → compare with desired state → act to reduce the gap → repeat. Controllers never stop reconciling, making the cluster self-healing: it constantly converges reality toward your declared intentions.

Q: What are the three Worker Node components and their functions?
A: (1) kubelet — node agent that receives Pod specs, manages containers, reports status; (2) kube-proxy — maintains network rules (iptables/IPVS) implementing Service load balancing; (3) containerd — pulls images and executes containers.

Q: On GKE, what does Google manage vs. what do you manage?
A: Google manages the entire Control Plane, container runtime, CNI, system DaemonSets, control plane upgrades, and etcd backups. On Standard, you manage node pools. On Autopilot, Google manages everything — you only manage Pods and workloads.