Applications need data and configuration. This module covers everything from temporary scratch space to production database storage, plus how to manage sensitive configuration.
Module 5 of 8 | Difficulty: Intermediate
You now understand how Pods communicate — Services provide stable addresses, Ingress routes external traffic, and DNS resolves names within the cluster. But what happens to your data when a Pod restarts? Imagine you've deployed a MySQL database, loaded it with customer records, and the Pod crashes. Kubernetes creates a replacement — but the database is empty. Your data vanished. In this chapter, you'll learn why that happens, how Kubernetes solves it through volumes, and how to build stateful workloads that survive restarts.
5.1.1 The Container Storage Problem
Containers are ephemeral by design — lightweight, disposable, and replaceable. But this is also their Achilles' heel for data. Every container has a writable filesystem layer that exists only as long as the container runs. When Kubernetes replaces a crashed Pod, the new container starts from the original image — a completely fresh filesystem with no trace of what was written before.
For stateless applications, this is fine. For stateful apps — databases, file servers, message queues — it's catastrophic.
⚠️ Common Misconception: Some learners think restartPolicy: Always preserves the writable container layer. It does not — it only restarts the container process. Only volumes persist data across restarts.
5.1.2 Volume Types Overview
Kubernetes supports many volume types. Choosing the wrong one can lead to data loss or performance problems.
Visual Description: The volume types spectrum ranges from fully ephemeral to fully persistent.
emptyDir is the simplest volume. Kubernetes creates an empty directory when the Pod is scheduled, and all containers in the Pod can share it. It lives on node disk (or RAM with medium: Memory). When the Pod dies, the emptyDir is deleted. Use it for temporary scratch space.
hostPath mounts a file or directory from the host node's filesystem into the Pod. This breaks portability — the Pod is tied to a specific node. It's an anti-pattern for stateful applications and is restricted on GKE Autopilot.
configMap and secret volumes project configuration data into Pods as read-only files. ConfigMaps hold non-sensitive data; Secrets hold sensitive data (base64-encoded). Neither provides general-purpose writable storage.
Persistent Volumes are cluster resources backed by networked storage — GCE Persistent Disks, AWS EBS, NFS shares, or any system with a CSI driver. They exist independently of any Pod and survive Pod deletion and node failure.
5.1.3 PersistentVolumes and PersistentVolumeClaims
Kubernetes separates storage provisioning (admin responsibility) from storage consumption (developer responsibility). A PersistentVolume (PV) is a cluster resource representing provisioned storage. It contains capacity, access mode, storage class, reclaim policy, and a reference to the actual backing storage.
A PersistentVolumeClaim (PVC) is a user's request for storage. A developer writes a PVC saying "I need 5Gi with ReadWriteOnce access." Kubernetes finds a matching PV (or dynamically provisions one) and binds them. The developer never needs to know whether the storage is SSD or HDD — they just get a claim to mount into their Pod.
Visual Description: The PVC-to-PV binding flow.
Access Modes:
| Mode | Meaning | Use Case |
|---|---|---|
| ReadWriteOnce (RWO) | One node can read/write | Single-instance databases |
| ReadOnlyMany (ROX) | Many nodes can read | Shared static assets |
| ReadWriteMany (RWX) | Many nodes can read/write | Shared file storage |
⚠️ Common Misconception: RWO means "one Pod" can use the volume. It actually means "one node" — multiple Pods on the same node can mount an RWO volume.
Reclaim Policies: Retain preserves PV and data (safest for production). Delete removes underlying storage (default for dynamic provisioning). Recycle is deprecated.
Volume Binding Modes: Immediate provisions the PV as soon as the PVC is created, but can cause zone mismatches. WaitForFirstConsumer delays provisioning until a Pod uses the PVC, ensuring storage is created in the same zone. Use WaitForFirstConsumer whenever possible.
5.1.4 StorageClasses and Dynamic Provisioning
Manually creating PVs for every request doesn't scale. StorageClasses enable dynamic provisioning — storage created automatically on demand. A StorageClass defines the provisioner, parameters (disk type, zones, encryption), reclaim policy, and volume binding mode.
Your cluster can have multiple StorageClasses:
| StorageClass | Provisioner | Use Case |
|---|---|---|
fast-ssd |
pd.csi.storage.gke.io |
Databases, high-IOPS |
standard-rwo |
pd.csi.storage.gke.io |
General purpose |
filestore |
filestore.csi.storage.gke.io |
Shared RWX storage |
One StorageClass can be marked as default with the annotation storageclass.kubernetes.io/is-default-class: "true". When a PVC doesn't specify a storageClassName, the default is used.
Visual Description: Dynamic provisioning sequence.
5.1.5 GKE Storage Options
On GKE, you access Google's storage portfolio through CSI drivers.
Visual Description: GKE storage decision map.
GCE Persistent Disk is the default on GKE. Zonal PDs live in one zone; Regional PDs replicate across two zones for HA. standard-rwo (balanced) and premium-rwo (SSD) are the built-in StorageClasses. PDs are block storage — excellent for databases but only support RWO.
Filestore provides NFS-compatible shared file storage supporting RWX — multiple Pods across nodes can read and write simultaneously. Enable the Filestore CSI driver on your cluster.
GCS Fuse mounts Cloud Storage buckets as filesystems. It's read-optimized for large datasets and ML training data, but lacks full POSIX semantics — don't use it for databases.
Hyperdisk is Google's next-generation block storage with sub-millisecond latency for performance-critical workloads.
GKE Note: GKE uses the CSI architecture for all storage. The CSI driver runs as a controller on the control plane (for provisioning) and as a DaemonSet on every node (for mounting). View CSI pods with
kubectl get pods -n kube-system | grep csi.
GKE in Practice
On a production GKE cluster:
- Inspect StorageClasses:
kubectl get storageclass - Choose by workload:
premium-rwofor databases,standard-rwofor general, Filestore for RWX. - Set
volumeBindingMode: WaitForFirstConsumerto avoid zone mismatches. - Monitor PVC capacity and expand before running out.
🛑 PAUSE & RECALL — 2 minutes
- What happens to data written inside a container's own filesystem when the Pod restarts?
- Name three volume types and identify which are ephemeral vs. persistent.
- What is the difference between a PV and a PVC — who creates each?
- What does
volumeBindingMode: WaitForFirstConsumerprevent?
Rate your confidence (0-4).
5.1.6 Analogy: Storage Lockers and Warehouses
Analogy: Storage Lockers and Warehouses
Think of Kubernetes storage as facilities serving a city of temporary apartments (Pods):
- emptyDir is a shared kitchen counter inside one apartment. Roommates (containers) use it during the day, but everything gets wiped when everyone moves out.
- hostPath is a locker bolted to one specific building (node). If you move buildings, you can't access it anymore.
- PersistentVolume is a rented unit at an external warehouse. The warehouse exists independently of any apartment. You sign a contract (PVC), get assigned a unit (PV binding), and your belongings stay there regardless of which apartment you live in tomorrow.
- StorageClass is the warehouse's tier system. "Standard tier" gets basic shelving. "Premium tier" gets a climate-controlled vault. You choose when signing.
- Dynamic Provisioning is the warehouse's automated system. Submit a request and robots instantly construct a unit of the exact size and tier.
The key insight: apartments (Pods) come and go, but warehouse units (PersistentVolumes) endure.
5.1.7 Visual Description: Storage Architecture Diagram
Visual Description: The storage architecture shows data flow from Pod through to physical storage, with GKE options branching at the provisioner.
Trace the flow: The container writes to /var/lib/mysql, a volumeMount pointing to PVC mysql-data. The PVC references the standard-rwo StorageClass and is bound to a PV backed by a GCE Persistent Disk. The provisioner could just as easily create Filestore or Hyperdisk depending on which StorageClass you requested.
🤔 TRY BEFORE YOU SEE
You need to deploy a PostgreSQL database on GKE with these requirements:
- Data must survive Pod restarts and rescheduling
- Only one Pod will access the database
- You want SSD storage for good performance
- Storage should provision automatically without admin-created PVs
List the Kubernetes objects you'd create and which StorageClass you'd use. Write your answer, then check below.
Reveal: You need: (1) a PVC requesting premium-rwo StorageClass with ReadWriteOnce, (2) a Secret for the password, and (3) a Deployment referencing both. The premium-rwo StorageClass triggers dynamic provisioning — GKE creates the SSD Persistent Disk and PV automatically.
5.1.8 Lab: LAB-5.1 — Persistent Storage (60 min)
In this lab, you'll create data in MySQL, delete the Pod, and watch your data survive.
Step 1: Create a PVC with Dynamic Provisioning
Save as mysql-pvc.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard-rwo
Apply:
kubectl apply -f mysql-pvc.yaml
kubectl get pvc mysql-data
kubectl get pv
Your PVC shows Pending, then Bound. kubectl get pv reveals a brand-new PV created automatically — dynamic provisioning in action.
Step 2: Deploy MySQL with the PVC
Save as mysql-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: "lab-password-123"
- name: MYSQL_DATABASE
value: "labdb"
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-data
kubectl apply -f mysql-deployment.yaml
kubectl get pods -l app=mysql -w
The Recreate strategy ensures only one MySQL Pod exists at a time, avoiding conflicts with the RWO volume.
Step 3: Create Data in MySQL
MYSQL_POD=$(kubectl get pod -l app=mysql -o jsonpath='{.items[0].metadata.name}')
kubectl exec -it $MYSQL_POD -- mysql -uroot -p"lab-password-123" labdb -e "
CREATE TABLE IF NOT EXISTS messages (id INT PRIMARY KEY, content VARCHAR(255));
INSERT INTO messages VALUES (1, 'Hello Persistent Storage!');
SELECT * FROM messages;
"
Step 4: Delete the Pod and Verify Data Survival
kubectl delete pod -l app=mysql
kubectl get pods -l app=mysql -w
Wait for the new Pod, then verify:
MYSQL_POD=$(kubectl get pod -l app=mysql -o jsonpath='{.items[0].metadata.name}')
kubectl exec -it $MYSQL_POD -- mysql -uroot -p"lab-password-123" labdb -e "SELECT * FROM messages;"
Your data survived! The new Pod mounted the same PVC bound to the same PV — the Persistent Disk still contains all database files.
Step 5: Inspect the PV and StorageClass
kubectl describe pv $(kubectl get pvc mysql-data -o jsonpath='{.spec.volumeName}')
kubectl describe storageclass standard-rwo
Note the provisioner (pd.csi.storage.gke.io), reclaim policy (Delete), and volume binding mode. With Delete, removing the PVC also removes the GCE disk — be careful!
Step 6: Test the ReadWriteOnce Constraint
Save as second-mysql.yaml:
apiVersion: v1
kind: Pod
metadata:
name: second-mysql
spec:
containers:
- name: mysql
image: mysql:8.0
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-data
kubectl apply -f second-mysql.yaml
kubectl describe pod second-mysql
The Pod stays Pending — Events show the volume is already in use. This is RWO enforcement. Clean up:
kubectl delete pod second-mysql
kubectl delete deployment mysql
kubectl delete pvc mysql-data
🛑 PAUSE & RECALL — 2 minutes
- In the lab, why did MySQL data survive Pod deletion? Trace the path from container mount through PVC and PV to physical storage.
- What would happen if you deleted the PVC? (Hint: think about the reclaim policy.)
- Why did the second MySQL Pod stay in Pending?
- Why was
strategy: Recreateimportant for the Deployment?
Rate your confidence (0-4).
Chapter Summary
Containers are ephemeral — data written to a container's filesystem disappears on restart. Kubernetes solves this through volumes from ephemeral (emptyDir) to fully persistent (PersistentVolumes). The PV/PVC pattern separates storage provisioning from consumption. StorageClasses enable dynamic provisioning on demand. On GKE, choose from GCE PD (RWO), Filestore (RWX), GCS Fuse (object storage), and Hyperdisk (high performance). The defining moment: your database data survives Pod deletion because it lives on a PersistentVolume that outlives any container.
📇 KEY CONCEPT CARDS
- Q: What happens to data written inside a container's filesystem when the container restarts? A: It is lost permanently. The container's writable layer is ephemeral. Only data written to a mounted volume persists.
- Q: What is the difference between a PersistentVolume (PV) and a PersistentVolumeClaim (PVC)? A: A PV is cluster-level storage provisioned by an admin or dynamic provisioner. A PVC is a user's request for storage. Kubernetes binds a PVC to a matching PV based on capacity, access mode, and storage class.
- Q: What does ReadWriteOnce (RWO) mean, and what is a common misconception about it? A: RWO means the volume can be mounted read-write by ONE node at a time. The common misconception is that it means "one Pod" — multiple Pods on the same node can share an RWO volume.
- Q: What is a StorageClass, and why is dynamic provisioning valuable? A: A StorageClass is a template defining a storage tier (provisioner, parameters, reclaim policy). Dynamic provisioning automatically creates PVs and underlying storage when a PVC is submitted, eliminating the need for admins to manually provision storage for every request.
ubmitted, eliminating the need for admins to manually provision storage for every request.