Chapter 5.1 · Volumes and Persistent Storage

Module 5: Storage, Configuration, and Secrets

Applications need data and configuration. This module covers everything from temporary scratch space to production database storage, plus how to manage sensitive configuration.

Module 5 of 8 | Difficulty: Intermediate

You now understand how Pods communicate — Services provide stable addresses, Ingress routes external traffic, and DNS resolves names within the cluster. But what happens to your data when a Pod restarts? Imagine you've deployed a MySQL database, loaded it with customer records, and the Pod crashes. Kubernetes creates a replacement — but the database is empty. Your data vanished. In this chapter, you'll learn why that happens, how Kubernetes solves it through volumes, and how to build stateful workloads that survive restarts.

5.1.1 The Container Storage Problem

Containers are ephemeral by design — lightweight, disposable, and replaceable. But this is also their Achilles' heel for data. Every container has a writable filesystem layer that exists only as long as the container runs. When Kubernetes replaces a crashed Pod, the new container starts from the original image — a completely fresh filesystem with no trace of what was written before.

For stateless applications, this is fine. For stateful apps — databases, file servers, message queues — it's catastrophic.

⚠️ Common Misconception: Some learners think restartPolicy: Always preserves the writable container layer. It does not — it only restarts the container process. Only volumes persist data across restarts.

5.1.2 Volume Types Overview

Kubernetes supports many volume types. Choosing the wrong one can lead to data loss or performance problems.

Visual Description: The volume types spectrum ranges from fully ephemeral to fully persistent.

graph LR subgraph "Ephemeral → Persistent Spectrum" direction LR A[emptyDir Pod-shared temp Lost on pod death] --> B[hostPath Node-local disk Tied to one node] B --> C[configMap/secret Read-only config Stored in etcd] C --> D[PersistentVolume Network storage Survives everything] end style A fill:#ffcc80 style B fill:#ffab91 style C fill:#ce93d8 style D fill:#a5d6a7

emptyDir is the simplest volume. Kubernetes creates an empty directory when the Pod is scheduled, and all containers in the Pod can share it. It lives on node disk (or RAM with medium: Memory). When the Pod dies, the emptyDir is deleted. Use it for temporary scratch space.

hostPath mounts a file or directory from the host node's filesystem into the Pod. This breaks portability — the Pod is tied to a specific node. It's an anti-pattern for stateful applications and is restricted on GKE Autopilot.

configMap and secret volumes project configuration data into Pods as read-only files. ConfigMaps hold non-sensitive data; Secrets hold sensitive data (base64-encoded). Neither provides general-purpose writable storage.

Persistent Volumes are cluster resources backed by networked storage — GCE Persistent Disks, AWS EBS, NFS shares, or any system with a CSI driver. They exist independently of any Pod and survive Pod deletion and node failure.

5.1.3 PersistentVolumes and PersistentVolumeClaims

Kubernetes separates storage provisioning (admin responsibility) from storage consumption (developer responsibility). A PersistentVolume (PV) is a cluster resource representing provisioned storage. It contains capacity, access mode, storage class, reclaim policy, and a reference to the actual backing storage.

A PersistentVolumeClaim (PVC) is a user's request for storage. A developer writes a PVC saying "I need 5Gi with ReadWriteOnce access." Kubernetes finds a matching PV (or dynamically provisions one) and binds them. The developer never needs to know whether the storage is SSD or HDD — they just get a claim to mount into their Pod.

Visual Description: The PVC-to-PV binding flow.

sequenceDiagram participant Dev as Developer participant PVC as PVC (Storage Request) participant K8s as Kubernetes participant PV as PV (Storage Locker) Dev->>PVC: Create ("5Gi, RWO") PVC->>K8s: Submit request K8s->>PV: Find matching PV K8s-->>PVC: Status: Bound Dev->>PVC: Reference in Pod spec

Access Modes:

Mode	Meaning	Use Case
ReadWriteOnce (RWO)	One node can read/write	Single-instance databases
ReadOnlyMany (ROX)	Many nodes can read	Shared static assets
ReadWriteMany (RWX)	Many nodes can read/write	Shared file storage

⚠️ Common Misconception: RWO means "one Pod" can use the volume. It actually means "one node" — multiple Pods on the same node can mount an RWO volume.

Reclaim Policies: Retain preserves PV and data (safest for production). Delete removes underlying storage (default for dynamic provisioning). Recycle is deprecated.

Volume Binding Modes: Immediate provisions the PV as soon as the PVC is created, but can cause zone mismatches. WaitForFirstConsumer delays provisioning until a Pod uses the PVC, ensuring storage is created in the same zone. Use WaitForFirstConsumer whenever possible.

5.1.4 StorageClasses and Dynamic Provisioning

Manually creating PVs for every request doesn't scale. StorageClasses enable dynamic provisioning — storage created automatically on demand. A StorageClass defines the provisioner, parameters (disk type, zones, encryption), reclaim policy, and volume binding mode.

Your cluster can have multiple StorageClasses:

StorageClass	Provisioner	Use Case
`fast-ssd`	`pd.csi.storage.gke.io`	Databases, high-IOPS
`standard-rwo`	`pd.csi.storage.gke.io`	General purpose
`filestore`	`filestore.csi.storage.gke.io`	Shared RWX storage

One StorageClass can be marked as default with the annotation storageclass.kubernetes.io/is-default-class: "true". When a PVC doesn't specify a storageClassName, the default is used.

Visual Description: Dynamic provisioning sequence.

sequenceDiagram participant User as User participant PVC as PVC ("5Gi, fast-ssd") participant SC as StorageClass participant Prov as CSI Provisioner participant Cloud as Cloud Provider User->>PVC: Create PVC PVC->>SC: References template SC->>Prov: Invoke with parameters Prov->>Cloud: Create SSD disk Cloud-->>Prov: Disk ready Prov-->>PVC: PV created & bound

5.1.5 GKE Storage Options

On GKE, you access Google's storage portfolio through CSI drivers.

Visual Description: GKE storage decision map.

graph TD A[Need storage?] --> B{RWX needed?} B -->|Yes| C[Filestore NFS-compatible] B -->|No| D{Ultra-high perf?} D -->|Yes| E[Hyperdisk Sub-ms latency] D -->|No| F{Cross-zone HA?} F -->|Yes| G[Regional PD Replicated] F -->|No| H[Zonal PD Standard/premium-rwo] A --> I[Object storage?] --> J[GCS Fuse Buckets as filesystem]

GCE Persistent Disk is the default on GKE. Zonal PDs live in one zone; Regional PDs replicate across two zones for HA. standard-rwo (balanced) and premium-rwo (SSD) are the built-in StorageClasses. PDs are block storage — excellent for databases but only support RWO.

Filestore provides NFS-compatible shared file storage supporting RWX — multiple Pods across nodes can read and write simultaneously. Enable the Filestore CSI driver on your cluster.

GCS Fuse mounts Cloud Storage buckets as filesystems. It's read-optimized for large datasets and ML training data, but lacks full POSIX semantics — don't use it for databases.

Hyperdisk is Google's next-generation block storage with sub-millisecond latency for performance-critical workloads.

GKE Note: GKE uses the CSI architecture for all storage. The CSI driver runs as a controller on the control plane (for provisioning) and as a DaemonSet on every node (for mounting). View CSI pods with kubectl get pods -n kube-system | grep csi.

GKE in Practice

On a production GKE cluster:

Inspect StorageClasses: kubectl get storageclass
Choose by workload: premium-rwo for databases, standard-rwo for general, Filestore for RWX.
Set volumeBindingMode: WaitForFirstConsumer to avoid zone mismatches.
Monitor PVC capacity and expand before running out.

🛑 PAUSE & RECALL — 2 minutes

What happens to data written inside a container's own filesystem when the Pod restarts?
Name three volume types and identify which are ephemeral vs. persistent.
What is the difference between a PV and a PVC — who creates each?
What does volumeBindingMode: WaitForFirstConsumer prevent?

Rate your confidence (0-4).

5.1.6 Analogy: Storage Lockers and Warehouses

Analogy: Storage Lockers and Warehouses

Think of Kubernetes storage as facilities serving a city of temporary apartments (Pods):

emptyDir is a shared kitchen counter inside one apartment. Roommates (containers) use it during the day, but everything gets wiped when everyone moves out.

hostPath is a locker bolted to one specific building (node). If you move buildings, you can't access it anymore.

PersistentVolume is a rented unit at an external warehouse. The warehouse exists independently of any apartment. You sign a contract (PVC), get assigned a unit (PV binding), and your belongings stay there regardless of which apartment you live in tomorrow.

StorageClass is the warehouse's tier system. "Standard tier" gets basic shelving. "Premium tier" gets a climate-controlled vault. You choose when signing.

Dynamic Provisioning is the warehouse's automated system. Submit a request and robots instantly construct a unit of the exact size and tier.

The key insight: apartments (Pods) come and go, but warehouse units (PersistentVolumes) endure.

5.1.7 Visual Description: Storage Architecture Diagram

Visual Description: The storage architecture shows data flow from Pod through to physical storage, with GKE options branching at the provisioner.

graph TD subgraph "Pod [Apartment]" CONT[Container] VM[volumeMount /var/lib/mysql] end subgraph "Kubernetes Abstraction Layer" PVC[PVC mysql-data 5Gi RWO] SC[StorageClass standard-rwo] end subgraph "Provisioned Resources" PV[PV pvc-xxx-yyy 5Gi GCE PD] Prov[CSI Provisioner pd.csi.storage.gke.io] end subgraph "GKE Physical Storage Options" Zonal[Zonal PD] Regional[Regional PD] File[Filestore NFS RWX] Hyper[Hyperdisk] end CONT --> VM VM --> PVC PVC --> SC PVC --> PV SC --> Prov Prov --> Zonal Prov --> Regional Prov --> File Prov --> Hyper

Trace the flow: The container writes to /var/lib/mysql, a volumeMount pointing to PVC mysql-data. The PVC references the standard-rwo StorageClass and is bound to a PV backed by a GCE Persistent Disk. The provisioner could just as easily create Filestore or Hyperdisk depending on which StorageClass you requested.

🤔 TRY BEFORE YOU SEE

You need to deploy a PostgreSQL database on GKE with these requirements:

Data must survive Pod restarts and rescheduling
Only one Pod will access the database
You want SSD storage for good performance
Storage should provision automatically without admin-created PVs

List the Kubernetes objects you'd create and which StorageClass you'd use. Write your answer, then check below.

Reveal: You need: (1) a PVC requesting premium-rwo StorageClass with ReadWriteOnce, (2) a Secret for the password, and (3) a Deployment referencing both. The premium-rwo StorageClass triggers dynamic provisioning — GKE creates the SSD Persistent Disk and PV automatically.

5.1.8 Lab: LAB-5.1 — Persistent Storage (60 min)

In this lab, you'll create data in MySQL, delete the Pod, and watch your data survive.

Step 1: Create a PVC with Dynamic Provisioning

Save as mysql-pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: standard-rwo

Apply:

kubectl apply -f mysql-pvc.yaml
kubectl get pvc mysql-data
kubectl get pv

Your PVC shows Pending, then Bound. kubectl get pv reveals a brand-new PV created automatically — dynamic provisioning in action.

Step 2: Deploy MySQL with the PVC

Save as mysql-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "lab-password-123"
        - name: MYSQL_DATABASE
          value: "labdb"
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-storage
        persistentVolumeClaim:
          claimName: mysql-data

kubectl apply -f mysql-deployment.yaml
kubectl get pods -l app=mysql -w

The Recreate strategy ensures only one MySQL Pod exists at a time, avoiding conflicts with the RWO volume.

Step 3: Create Data in MySQL

MYSQL_POD=$(kubectl get pod -l app=mysql -o jsonpath='{.items[0].metadata.name}')
kubectl exec -it $MYSQL_POD -- mysql -uroot -p"lab-password-123" labdb -e "
  CREATE TABLE IF NOT EXISTS messages (id INT PRIMARY KEY, content VARCHAR(255));
  INSERT INTO messages VALUES (1, 'Hello Persistent Storage!');
  SELECT * FROM messages;
"

Step 4: Delete the Pod and Verify Data Survival

kubectl delete pod -l app=mysql
kubectl get pods -l app=mysql -w

Wait for the new Pod, then verify:

MYSQL_POD=$(kubectl get pod -l app=mysql -o jsonpath='{.items[0].metadata.name}')
kubectl exec -it $MYSQL_POD -- mysql -uroot -p"lab-password-123" labdb -e "SELECT * FROM messages;"

Your data survived! The new Pod mounted the same PVC bound to the same PV — the Persistent Disk still contains all database files.

Step 5: Inspect the PV and StorageClass

kubectl describe pv $(kubectl get pvc mysql-data -o jsonpath='{.spec.volumeName}')
kubectl describe storageclass standard-rwo

Note the provisioner (pd.csi.storage.gke.io), reclaim policy (Delete), and volume binding mode. With Delete, removing the PVC also removes the GCE disk — be careful!

Step 6: Test the ReadWriteOnce Constraint

Save as second-mysql.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: second-mysql
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    volumeMounts:
    - name: mysql-storage
      mountPath: /var/lib/mysql
  volumes:
  - name: mysql-storage
    persistentVolumeClaim:
      claimName: mysql-data

kubectl apply -f second-mysql.yaml
kubectl describe pod second-mysql

The Pod stays Pending — Events show the volume is already in use. This is RWO enforcement. Clean up:

kubectl delete pod second-mysql
kubectl delete deployment mysql
kubectl delete pvc mysql-data

🛑 PAUSE & RECALL — 2 minutes

In the lab, why did MySQL data survive Pod deletion? Trace the path from container mount through PVC and PV to physical storage.
What would happen if you deleted the PVC? (Hint: think about the reclaim policy.)
Why did the second MySQL Pod stay in Pending?
Why was strategy: Recreate important for the Deployment?

Rate your confidence (0-4).

Chapter Summary

Containers are ephemeral — data written to a container's filesystem disappears on restart. Kubernetes solves this through volumes from ephemeral (emptyDir) to fully persistent (PersistentVolumes). The PV/PVC pattern separates storage provisioning from consumption. StorageClasses enable dynamic provisioning on demand. On GKE, choose from GCE PD (RWO), Filestore (RWX), GCS Fuse (object storage), and Hyperdisk (high performance). The defining moment: your database data survives Pod deletion because it lives on a PersistentVolume that outlives any container.

📇 KEY CONCEPT CARDS

Q: What happens to data written inside a container's filesystem when the container restarts? A: It is lost permanently. The container's writable layer is ephemeral. Only data written to a mounted volume persists.

Q: What is the difference between a PersistentVolume (PV) and a PersistentVolumeClaim (PVC)? A: A PV is cluster-level storage provisioned by an admin or dynamic provisioner. A PVC is a user's request for storage. Kubernetes binds a PVC to a matching PV based on capacity, access mode, and storage class.

Q: What does ReadWriteOnce (RWO) mean, and what is a common misconception about it? A: RWO means the volume can be mounted read-write by ONE node at a time. The common misconception is that it means "one Pod" — multiple Pods on the same node can share an RWO volume.

Q: What is a StorageClass, and why is dynamic provisioning valuable? A: A StorageClass is a template defining a storage tier (provisioner, parameters, reclaim policy). Dynamic provisioning automatically creates PVs and underlying storage when a PVC is submitted, eliminating the need for admins to manually provision storage for every request.

ubmitted, eliminating the need for admins to manually provision storage for every request.