Chapter 6.3 · Advanced Security and Audit

You have built security foundations: RBAC for keycard access, Pod Security Standards for building codes, and Network Policies for district roadblocks. But a truly secure cluster needs more: a permit office inspecting plans, food safety tracing ingredients, cameras recording incidents, and patrols detecting active threats. This chapter integrates admission controllers, supply chain security, audit logging, and runtime detection into defense-in-depth.

Analogy: City-Wide Security Infrastructure

Analogy: City-Wide Security Infrastructure

Imagine a metropolis with layered security. The building permit office (admission controllers) reviews every construction plan: does it meet fire codes? Some plans get auto-corrected (mutating); others are approved or denied (validating). Without it, anyone could build unsafe structures.

The food safety inspection network (supply chain security) ensures every ingredient is traced and certified. If contamination enters, inspectors trace it to the source before harm spreads.

The security camera network (audit logging) captures timestamped records: who entered which building, what was accessed. These recordings reconstruct incidents.

The police patrols (runtime security) detect and respond to active threats.

The central command center (GKE Security Command Center) aggregates all feeds into a unified threat picture. Your cluster needs all these systems for enterprise security.

Visual Description: City Security Infrastructure Map

Visual Description: The map shows central command with four layers. The permit office (Admission Controllers) sits between the city gate (API Server) and construction sites. Every blueprint passes through before building begins. Inspection stations (Supply Chain Security) verify materials at entrances. A camera network (Audit Logging) records interactions, streaming to an archive. Patrol routes (Runtime Security) watch neighborhoods. All feed intelligence to central command.

graph TD subgraph "Central Command" SCC[Security Command Center] end subgraph "Layer 1: Admission Control" API[API Server] --> MW[Mutating Webhooks] --> VW[Validating Webhooks] end subgraph "Layer 2: Supply Chain" SIGN[cosign] --> REG[Artifact Registry] --> BA[Binary Authorization] end subgraph "Layer 3: Audit Logging" AUD[Audit Policy] --> AL[Cloud Logging Archive] end subgraph "Layer 4: Runtime Security" FAL[Falco] --> PM[Syscall Analysis] end subgraph "Cluster" WORK[Workloads] end VW -->|approved| WORK BA -->|verified| WORK FAL -->|watches| WORK WORK -->|events| AUD AL --> SCC FAL --> SCC BA --> SCC VW --> SCC style SCC fill:#d32f2f,color:#fff style API fill:#1976d2,color:#fff style BA fill:#388e3c,color:#fff style FAL fill:#7b1fa2,color:#fff style AL fill:#455a64,color:#fff

Defense in Depth: Layered Security Architecture

The following diagram shows how all security layers from prior chapters and this chapter work together. No single layer is sufficient; each catches threats that bypass the others.

graph LR subgraph "Layer 1: Perimeter" PRIVATE[Private Cluster] --> AUTHNET[Authorized Networks] AUTHNET --> INGRESS[Ingress TLS] end subgraph "Layer 2: Identity & Access" RBAC[RBAC] --> WI[Workload Identity] WI --> SA[ServiceAccounts] end subgraph "Layer 3: Admission & Policy" PSA[Pod Security Standards] --> AC[Admission Controllers] AC --> BA[Binary Authorization] end subgraph "Layer 4: Network Segmentation" NP[Network Policies] --> DP[Dataplane V2] end subgraph "Layer 5: Runtime & Audit" FALC[Falco Runtime] --> AUDIT[Audit Logging] AUDIT --> SCC2[Security Command Center] end PRIVATE --> RBAC RBAC --> PSA PSA --> NP NP --> FALC style PRIVATE fill:#f44336,color:#fff style RBAC fill:#ff9800,color:#000 style PSA fill:#ffeb3b,color:#000 style NP fill:#4caf50,color:#fff style FALC fill:#2196f3,color:#fff style SCC2 fill:#9c27b0,color:#fff

Each layer catches what prior ones missed. A compromised credential bypasses perimeter but faces RBAC. A malicious image past scanning faces runtime detection. This is defense in depth: never rely on one control.

Admission Controllers: The Permit Office

Every API request passes through admission controllers before etcd persists it. Think of it as a permit review: every blueprint is inspected before the city records it.

There are two types of webhooks:

Mutating webhooks can modify resources. Like a plan corrector saying, "Your blueprint is missing fire exits. I will add them." NamespaceLifecycle auto-assigns namespaces; Pod Security Admission adds defaults. Mutators execute first.

Validating webhooks only approve or reject. Like an inspector saying, "This exceeds the height limit. Denied." LimitRanger checks resource constraints; ResourceQuota caps namespace consumption.

For custom policies, deploy OPA Gatekeeper (Rego language) or Kyverno (YAML patterns). Both operate as webhooks enforcing rules at the API Server. Custom webhooks must be highly available: an unreachable required webhook paralyzes matching requests.

🛑 PAUSE & RECALL — 2 minutes

What is the difference between a mutating and validating admission webhook?

Name two built-in admission controllers and their purposes.

What happens if a required validating webhook is unreachable?

Rate your confidence (0-4).

Supply Chain Security: From Code to Trusted Image

The container image supply chain spans source code, dependencies, build environment, base image, registry, and cluster. Every link is an attack vector.

Visual Description: The pipeline flows left to right. Source code enters CI/CD. A minimal base image is pulled. An SBOM lists dependencies. The image is signed with cosign and pushed to Artifact Registry. Binary Authorization verifies the signature before deployment. Unsigned images are rejected.

graph LR CODE[Source Code] --> BUILD[Build] BASE[Trusted Base Image] --> BUILD BUILD --> SBOM[SBOM] BUILD --> SIGN[cosign Sign] SIGN --> REG[Artifact Registry] REG --> BA[Binary Authorization] BA -->|signed| DEPLOY[Deploy to GKE] BA -->|unsigned| REJECT[Rejected] style DEPLOY fill:#4caf50,color:#fff style REJECT fill:#f44336,color:#fff style BA fill:#ff9800,color:#000

GKE Note: Binary Authorization is a managed admission controller enforcing signature verification. You define attestation policies specifying which attestors must sign an image. Without it, anyone with kubectl access can deploy any image.

Vulnerability scanning happens at two stages: in Artifact Registry (continuous monitoring) and at deployment time (Binary Authorization policy). An SBOM provides a machine-readable inventory enabling rapid response to new CVEs.

Use minimal base images like distroless — no shell, no package manager, just your application and its runtime dependencies.

Audit Logging: The Camera Network

While admission controllers prevent threats, audit logging records everything at the API Server. Every request from kubectl, controllers, or pods can be logged at configurable detail levels.

Kubernetes supports four audit levels:

None: No logging. For health checks and noise reduction.
Metadata: Who, when, which resource, which verb — no bodies.
Request: Metadata plus the request body. Captures the attempt.
RequestResponse: Everything including the response body. Most verbose.

You configure levels via an audit policy mapping resources and users. A well-designed policy captures critical events without storage overload.

GKE Note: GKE integrates audit logs with Cloud Audit Logs automatically. For Enterprise, export to BigQuery or Chronicle for SIEM correlation.

Audit logs enable forensic analysis: who created that privileged pod? When was the RBAC rule changed? Which ServiceAccount accessed that Secret?

Runtime Security: Active Patrols

Admission controls are preventive; audit logging is detective; runtime security detects threats unfolding inside the cluster.

Falco monitors system calls and Kubernetes audit events, detecting:

Unexpected shell spawns (compromise indicator)
Privilege escalation attempts
Sensitive file access (/etc/shadow, tokens)
Container escape attempts (host namespace access)
Unexpected outbound connections

Falco runs as a DaemonSet using eBPF for syscall monitoring. When rules fire, alerts go to Slack, webhooks, or your SIEM.

GKE Note: GKE provides built-in threat detection via Security Command Center without installing Falco. It detects escapes, crypto mining, and privilege escalation using control plane and container signals.

🛑 PAUSE & RECALL — 2 minutes

Why is runtime security needed despite admission controls and Binary Authorization?

What is the difference between an SBOM and an image signature?

Which audit level is appropriate for read-only get operations on ConfigMaps?

Rate your confidence (0-4).

Security Hardening Checklist

Cluster Hardening:

[ ] Enable private cluster and authorized networks
[ ] Enable Workload Identity for pod-to-GCP authentication
[ ] Enable Shielded GKE nodes (Secure Boot, Integrity Monitoring)
[ ] Enable Application-layer secrets encryption (CMEK)

Network Security:

[ ] Enable Dataplane V2 for network policies and visibility
[ ] Deploy default-deny NetworkPolicies in sensitive namespaces

Workload Security:

[ ] Enforce Pod Security Standards (Restricted level)
[ ] Use Binary Authorization to require signed images
[ ] Scan images for vulnerabilities before deployment
[ ] Run containers as non-root with read-only root filesystems

Identity and Access:

[ ] Apply least-privilege RBAC
[ ] Disable automount of ServiceAccount tokens where unneeded

Observability:

[ ] Enable Kubernetes audit logging with tailored policy
[ ] Enable GKE Security Insights in Security Command Center

Supply Chain:

[ ] Sign images with cosign
[ ] Generate SBOMs for production images
[ ] Use minimal base images; pin digests in manifests

GKE Enterprise Security

Security Command Center aggregates GKE findings: threat detection, vulnerability scanning, and misconfigurations. It generates actionable findings with remediation guidance when container escapes or policy violations are detected.

Chronicle SIEM ingests audit logs, VPC flow logs, and other signals, correlating them with threat intelligence. It builds multi-signal threat stories — connecting suspicious pod creation with anomalous network flows and RBAC changes.

Assured Workloads enforces compliance by restricting data residency, encryption keys, and service usage to specific frameworks like HIPAA or FedRAMP. Access Transparency logs all Google admin actions on your resources.

GKE in Practice

GKE Note: GKE's security posture dashboard provides a single-pane view of cluster security, flagging clusters lacking Workload Identity, Binary Authorization, or Shielded nodes with one-click remediation. Review this dashboard weekly for production clusters.

Enabling Binary Authorization requires three steps: create an attestor with a Container Analysis note; configure CI/CD to sign images with a KMS key; set the GKE cluster policy to require that attestor. Unsigned deployments are rejected with a clear policy violation message.

🤔 TRY BEFORE YOU SEE

Write two audit policy rules: one that logs request bodies (not responses) for all Secret creation and update operations, and one that excludes the system:serviceaccount:kube-system:generic-garbage-collector user entirely. Write your attempt before seeing the solution.

Reveal: First, a None level rule matching the garbage collector ServiceAccount for Secrets. Second, a Request level rule matching all users for Secrets with verbs create and update. Order matters: the None rule must come first since policies are evaluated top-down.

⚠️ Common Misconception: Teams sometimes believe vulnerability scanning replaces Binary Authorization. It does not. Scanning identifies known vulnerabilities at a point in time; Binary Authorization ensures only approved, signed images deploy. These controls are complementary.

Lab: LAB-6.3 — Advanced Security Implementation (75 min)

You will install OPA Gatekeeper, enforce a constraint, configure audit logging, and review Security Command Center findings.

Prerequisites: A GKE Standard cluster with cluster-admin access.

Step 1: Install OPA Gatekeeper (15 min)

kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml
kubectl wait --for=condition=Ready pod -l control-plane=controller-manager \
  -n gatekeeper-system --timeout=120s

Step 2: Apply a Constraint Template and Constraint (20 min)

Create required-resources-template.yaml:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredresources
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredResources
      validation:
        type: object
        properties:
          limits:
            type: array
            items:
              type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredresources
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          required := input.parameters.limits[_]
          not container.resources.limits[required]
          msg := sprintf("Container %s must define %s limit", [container.name, required])
        }

kubectl apply -f required-resources-template.yaml

Create required-resources-constraint.yaml:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredResources
metadata:
  name: require-cpu-and-memory-limits
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: ["kube-system"]
  parameters:
    limits: ["cpu", "memory"]

kubectl apply -f required-resources-constraint.yaml

Step 3: Test Gatekeeper (10 min)

apiVersion: v1
kind: Pod
metadata:
  name: test-no-limits
spec:
  containers:
    - name: nginx
      image: nginx:1.25

kubectl apply -f bad-pod.yaml

Expected output: Gatekeeper rejects the pod with an error requiring CPU and memory limits. Deploy a compliant pod with limits defined and verify it succeeds.

Step 4: Verify Audit Logging (15 min)

Ensure "Kubernetes Data Access logs" is enabled in Cloud Console. Query Cloud Logging:

gcloud logging read "protoPayload.serviceName=\\"k8s.io\\" AND \
  protoPayload.methodName=\\"io.k8s.core.v1.pods.create\\"" \
  --limit=10 --format="table(timestamp,protoPayload.authenticationInfo.principalEmail)"

Step 5: Review Security Command Center (15 min)

Navigate to Security Command Center in Cloud Console. Look for GKE findings such as clusters without Binary Authorization or authorized networks. Note which would be critical in production and what remediation steps you would apply.

Expected outcome: Gatekeeper blocks non-compliant pods, audit logs capture API activity, and Security Command Center surfaces configuration findings — demonstrating preventive control, detective logging, and centralized visibility working together.

📇 KEY CONCEPT CARDS

Q: What is the difference between a mutating and a validating admission webhook?
A: Mutating webhooks can modify resources before storage (e.g., adding defaults). Validating webhooks can only approve or reject. Mutators execute before validators in the admission chain.

Q: What are the four Kubernetes audit levels?
A: None (no logging), Metadata (who/when/what verb, no bodies), Request (metadata plus request body), RequestResponse (everything including response body).

Q: What does GKE Binary Authorization enforce?
A: It verifies image signatures at deployment time, ensuring only cryptographically signed images from trusted attestors run. It prevents unauthorized image deployment even if an attacker has kubectl access.

Q: In the city analogy, what are admission controllers, supply chain security, audit logging, and runtime security?
A: Admission controllers = building permit office. Supply chain security = food safety inspections. Audit logging = security camera recordings. Runtime security = police patrols detecting active threats.

threats.