Part 2 · Core Building Blocks Chapter 6

Caching, without confusion.

Why caches exist, where they help — and what they quietly break.

Learning objective

Explain caching in interviews as a response to repeated or expensive reads — including where to place the cache, what to cache, and how to talk clearly about staleness and invalidation.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

What can a cache improve, and what new problem can it create?

Commit

Write a rough answer.

Before reading, choose one value that may be safely stale and one value that must not be stale.

Connect

Notice where it returns.

Caching returns in URL redirects, product catalogs, feeds, profiles, and CDN-backed media.

Concrete first

The most common one-word answer in system design interviews.

Candidate hits a slow read path. Reaches for "Redis." The interviewer waits. The candidate adds "in front of the database." The interviewer waits longer.

The word "cache" doesn't say what's being stored, where the cache sits, what the key is, what happens on a miss, or how stale the data is allowed to get. This chapter is about saying all of that out loud — and saying it from a first-principles place, not a tool-name place.

First principles

Why caching even works.

Two physical truths make caching possible:

1 · Locality of reference

Most workloads are not uniformly distributed.

A small subset of items is asked for repeatedly. A short URL goes viral; one product page gets all the traffic; the same user profile is fetched a hundred times. If everything were equally cold, caching wouldn't help. It almost never is.

2 · Asymmetric speed

Reading from memory is roughly 10,000× faster than reading from disk over the network.

A copy in RAM near the application can return in under a millisecond. A round-trip to a database across the network is tens to hundreds. That gap is the entire economic case for caching.

Caching exploits both at once: keep the hot, repeatedly-asked-for data in fast memory close to the read path. The cost is one new question — what happens when the underlying data changes? That question is the whole rest of this chapter.

Mental model

A cache is a shortcut shelf, not the source of truth.

The warehouse in the back is the truth. The shelf near the counter is just a copy of what people ask for repeatedly. Speed improves. Freshness gets harder.

If the shelf has it, return fast. If not, walk to the warehouse, return the answer, and remember it on the way back.

The default pattern

Cache-aside: hit, miss, fill.

The name doesn't matter. The logic does. Memorize the shape — under interview pressure your brain will fall back on this exact loop.

If most reads hit, the average response time is dominated by the hit path. That's the magic — and the reason caching can carry traffic that would otherwise melt the database.

Why it matters in interviews

Saying "cache" isn't an answer. Saying these five things is.

What is being cached?
Where does the cache sit?
What is the key?
What happens on a miss?
How long does data stay there — and what happens when the underlying data changes?

Weak

We can put Redis in front of the database.

Strong

The hot path is repeated lookup by short code, so I'd add a shared cache in front of the database. On a miss, the service reads from the database, returns the result, and stores it under the short code with a TTL. The database stays the source of truth.

Key ideas

Seven anchors.

Cache repeated or expensive reads, not everything.
The source of truth usually remains the database or primary storage system.
A cache hit is fast; a cache miss falls back to the source of truth.
Cache keys, TTLs, and invalidation strategy matter as much as cache placement.
Local caches are simple and fast, but different app servers may see different values.
Shared distributed caches improve reuse across servers, but add operational complexity.
Caches reduce latency and database load — but risk stale or inconsistent reads.

The failure mode that catches everyone

Cold caches, hot misses, and thundering herds.

A hot key expires. A thousand requests for it arrive in the same second. They all miss. They all go to the database simultaneously.

This is the failure mode candidates often miss when they say "we'll just cache it." The fix isn't conceptually exotic — TTL jitter so all keys don't expire together, request coalescing so only one request fetches and the rest wait, or refresh-ahead so popular keys never expire cold. The point for the interview is simply to know it can happen and to mention it before the interviewer asks.

Speaking script

Lines for the cache conversation.

Opening

I'd add caching only if repeated reads or expensive queries are becoming the bottleneck.

Framing

The question isn't whether to cache. It's what to cache, where to cache it, and how stale we can tolerate it being.

Sketching

My default pattern is cache-aside: check the cache first, read from the database on miss, then populate the cache.

Truthfulness

The database remains the source of truth. The cache is only a fast copy.

Trade-off

A local cache is the cheapest win, but each server only helps itself. A shared cache helps every server but adds an operational hop.

Failure mode

I'd also think about cold-cache and mass-expiry storms — TTL jitter or request coalescing keeps the database from getting hammered when a hot key expires.

Common mistakes

How candidates make caching sound free.

Adding a cache before identifying a read bottleneck.
Saying "use Redis" without saying what key is cached or how misses work.
Treating the cache as the source of truth.
Ignoring staleness and invalidation.
Assuming every item is equally worth caching, even when only a small set is hot.
Forgetting that a cold cache or mass expiry can cause a surge of database traffic.
Caching highly dynamic data where freshness matters more than latency.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

A cache is a simple speed layer in front of a slow database.

Better model

What to replace it with

A cache is a copy with a freshness contract. It helps repeated reads but creates invalidation and consistency questions.

Interview move

What to do in the room

Say what is cached, where it sits, how long it lives, and what happens when the source changes.

Trade-offs

Four placements.

Caching choice	Good when	Weak when	Interview line
No cache	Data volume is low, reads are cheap, or exact freshness matters most.	Repeated reads are overwhelming the database or query latency is too high.	I'd skip caching in the first version unless the read path becomes a clear bottleneck.
Local in-process cache	One server can benefit from very fast memory lookups and the data is small or short-lived.	Multiple app instances need a shared view, or stale divergence becomes a problem.	A local cache is the cheapest performance win, but each server only helps itself.
Shared distributed cache Default	Many app servers need to reuse hot data and reduce database load together.	The system is too small to justify another network hop and another stateful component.	A shared cache helps when hot keys should benefit every server, not just the one that fetched them first.
Aggressive cache of dynamic data	Latency pressure is severe and the product can tolerate staleness.	Users expect exact fresh state or invalidation becomes too complex.	I'd only cache this aggressively if slight staleness is acceptable and we have a clear invalidation or TTL strategy.

Mini case study

URL shortener — caching, made specific.

Why this is a strong cache candidate

Lookups are read-heavy.
Many popular links are requested repeatedly.
The answer is small.
The mapping rarely changes.

Concrete design

Cache key: short code.
Cache value: long URL + redirect metadata.
On hit: return.
On miss: read from primary store, return, populate cache.

What goes right

Lower redirect latency.
Far fewer database reads.
Better survival under viral-link bursts.

What can go wrong

Updated or deleted links may serve briefly stale data.
A hot key expiring can spike the database.
Mitigation: TTL jitter, refresh-ahead, or request coalescing.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would cache public profile reads, but keep password, permission, and payment state closer to the source of truth.

We do

Complete the missing piece.

For the profile prompt, mark which fields can be cached and which should stay strongly fresh.

You do

Answer without notes.

Add one cache to the practice design and state its invalidation rule.

Practice

Try it before you read the model answer.

Prompt

"Design a read-heavy user profile service."

What would you cache?
Where would the cache sit?
What is the cache key?
How would you handle stale data?

Show a strong model answer

I'd cache profile reads because they're likely to repeat often and the same user profile may be requested many times. I'd use a shared cache in front of the primary database so all app servers can benefit from the same hot entries. The cache key would be the user ID. On a miss, the application would read from the database and populate the cache with a TTL. The database would remain the source of truth, and profile updates would either invalidate the cached entry or let it expire quickly depending on how fresh the product needs the data to be.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Find the repeated expensive read. Cache that.

Locality of reference is the whole reason caching works.

Say what · where · key · miss · expiry.

Five answers turn "we'll cache" into a real design.

Speed improves. Freshness gets harder.

Naming the staleness trade-off out loud is what separates strong candidates.

Reusable interview line

"I'd cache the hot read path with cache-aside: check first, fall through to the database on miss, populate with a TTL, and let the database remain the source of truth."