Why caches exist, where they help — and what they quietly break.
What can a cache improve, and what new problem can it create?
Before reading, choose one value that may be safely stale and one value that must not be stale.
Caching returns in URL redirects, product catalogs, feeds, profiles, and CDN-backed media.
Candidate hits a slow read path. Reaches for "Redis." The interviewer waits. The candidate adds "in front of the database." The interviewer waits longer.
The word "cache" doesn't say what's being stored, where the cache sits, what the key is, what happens on a miss, or how stale the data is allowed to get. This chapter is about saying all of that out loud — and saying it from a first-principles place, not a tool-name place.
Two physical truths make caching possible:
A small subset of items is asked for repeatedly. A short URL goes viral; one product page gets all the traffic; the same user profile is fetched a hundred times. If everything were equally cold, caching wouldn't help. It almost never is.
A copy in RAM near the application can return in under a millisecond. A round-trip to a database across the network is tens to hundreds. That gap is the entire economic case for caching.
Caching exploits both at once: keep the hot, repeatedly-asked-for data in fast memory close to the read path. The cost is one new question — what happens when the underlying data changes? That question is the whole rest of this chapter.
The name doesn't matter. The logic does. Memorize the shape — under interview pressure your brain will fall back on this exact loop.
A hot key expires. A thousand requests for it arrive in the same second. They all miss. They all go to the database simultaneously.
This is the failure mode candidates often miss when they say "we'll just cache it." The fix isn't conceptually exotic — TTL jitter so all keys don't expire together, request coalescing so only one request fetches and the rest wait, or refresh-ahead so popular keys never expire cold. The point for the interview is simply to know it can happen and to mention it before the interviewer asks.
A cache is a simple speed layer in front of a slow database.
A cache is a copy with a freshness contract. It helps repeated reads but creates invalidation and consistency questions.
Say what is cached, where it sits, how long it lives, and what happens when the source changes.
| Caching choice | Good when | Weak when | Interview line |
|---|---|---|---|
| No cache | Data volume is low, reads are cheap, or exact freshness matters most. | Repeated reads are overwhelming the database or query latency is too high. | I'd skip caching in the first version unless the read path becomes a clear bottleneck. |
| Local in-process cache | One server can benefit from very fast memory lookups and the data is small or short-lived. | Multiple app instances need a shared view, or stale divergence becomes a problem. | A local cache is the cheapest performance win, but each server only helps itself. |
| Shared distributed cache Default | Many app servers need to reuse hot data and reduce database load together. | The system is too small to justify another network hop and another stateful component. | A shared cache helps when hot keys should benefit every server, not just the one that fetched them first. |
| Aggressive cache of dynamic data | Latency pressure is severe and the product can tolerate staleness. | Users expect exact fresh state or invalidation becomes too complex. | I'd only cache this aggressively if slight staleness is acceptable and we have a clear invalidation or TTL strategy. |
Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.
I would cache public profile reads, but keep password, permission, and payment state closer to the source of truth.
For the profile prompt, mark which fields can be cached and which should stay strongly fresh.
Add one cache to the practice design and state its invalidation rule.
Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.
Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.
Change one constraint in the practice prompt and answer again in half the time.
Use the rubric to pick one dimension below 3, then retry only that dimension.
Locality of reference is the whole reason caching works.
Five answers turn "we'll cache" into a real design.
Naming the staleness trade-off out loud is what separates strong candidates.