Part 3 · Designing at Scale Chapter 10

Scaling reads.

How to serve more readers without making one system do all the work.

Learning objective

Identify the actual read bottleneck and choose the right fix such as caching, replicas, precomputed views, or edge delivery, using language tied directly to the access pattern.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

If many users read the same product data, what are three ways to avoid hitting one database repeatedly?

Commit

Write a rough answer.

Before reading, choose whether this is a cache problem, replica problem, CDN problem, or precompute problem.

Connect

Notice where it returns.

Read scaling returns in feeds, catalogs, URL redirects, profile pages, and media playback.

Concrete first

Read-heavy systems usually break before write-heavy ones do.

Many products are opened far more often than they are changed. One profile gets edited occasionally but viewed constantly. One short URL is created once and clicked thousands of times.

When reads dominate, the problem is not always "the database is slow." Sometimes the same object is fetched repeatedly. Sometimes the query itself is expensive. Sometimes the real pain is assembling the response. Sometimes the content is just too far from the user. The fix should come from the type of slowness.

Mental model

Four common reasons the read path becomes slow.

The database is getting too many repeated lookups of the same hot objects.
The reads still need fresh database access, but the primary should not serve all of them.
The expensive part is assembling the response, not fetching one record.
The content is static or blob-heavy and is being fetched from too far away.

Why it matters in interviews

Bottleneck-driven answers sound like engineering. Tool-driven answers sound like guessing.

Weak

We can scale reads with Redis.

Strong

If the issue is repeated lookup of the same hot objects, caching is the first fix. If the issue is too many database reads that still need fresh data, I would add read replicas. If building the response is the expensive part, I would precompute the read model. If the content is static or blob-heavy and globally accessed, I would use a CDN.

The strong answer chooses the technique from the shape of the problem, not from habit.

Key ideas

Seven anchors.

Read bottlenecks often appear before write bottlenecks.
Cache repeated hot reads first when freshness allows it.
Use read replicas when many reads still need database access.
Precompute expensive read results when building the response is the real cost.
Use CDN or edge delivery when static or blob content is fetched repeatedly across regions.
Read scaling usually trades freshness or simplicity for speed and capacity.
The best answer names the bottleneck, then picks the smallest justified fix.

Speaking script

Lines for the read-scaling conversation.

Opening

I want to scale reads based on where the read path is actually getting expensive.

Sketching

If many users ask the same question repeatedly, caching is my first move.

Deep dive

If reads still need the database but the primary is overloaded, I would add read replicas. If the expensive part is building the response, I would precompute the read model instead of reconstructing it on every request.

Trade-off

The gain is better latency and more read capacity. The cost is usually more staleness, more complexity, or both.

Extending

If content is static or blob-heavy and globally accessed, I would push it closer to users with a CDN.

Defending

I would rather add the smallest justified read-scaling move first than throw cache, replicas, and CDN at the system all at once.

Common mistakes

How candidates flatten very different read problems into one answer.

Treating every read bottleneck as a cache problem.
Adding replicas when the real issue is expensive query computation.
Adding a cache without naming the hot objects or repeated-read pattern.
Expecting replicas to solve every freshness-sensitive read.
Forgetting that feeds and aggregates may need precomputation, not just faster storage.
Using a CDN for content that is too private, too dynamic, or not read enough to benefit.
Naming several read-scaling tools at once without explaining what each one fixes.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Scaling reads means adding Redis.

Better model

What to replace it with

Read scaling is about serving copies safely: caches, replicas, CDNs, indexes, and precomputed views all help different read patterns.

Interview move

What to do in the room

Name the read path, freshness tolerance, and source of truth before picking the read-scaling tool.

Trade-offs

Five read-scaling choices.

Read-scaling choice	Good when	Weak when	Interview line
Keep reads on one primary store	Traffic is still small and the read path is simple.	The primary becomes the read bottleneck or latency is too high.	I would keep the first version simple until reads clearly pressure the primary path.
Cache hot data Default	The same objects are requested repeatedly and slight staleness is acceptable.	Reads are highly dynamic or invalidation is harder than the saved latency is worth.	Caching helps if the same hot objects are being fetched over and over.
Read replicas	Many reads still need database queries and the primary should stop serving all of them.	The main pain is stale-sensitive reads, expensive joins, or response assembly rather than raw read volume.	Read replicas help when I need more database read capacity without sending every read to the primary.
Precomputed read view	Building the response is expensive, such as feeds, rankings, or aggregates.	The read is simple enough that precomputation adds unnecessary complexity.	If the cost is assembling the view, I would precompute that view instead of rebuilding it on every request.
CDN or edge cache	Static or blob content is globally requested and repeat reads are common.	Content is highly private, rarely read, or must always be fetched from origin.	A CDN helps when the read problem is global content delivery, not just database pressure.

Mini case study

News feed — not one read problem, but three.

Simple read path

Read recent posts from storage.
Fine when the system is still small.
No extra machinery yet.

Repeated hot reads

Users reopen the same feed often.
Cache feed fragments or session reads.
Good when slight staleness is acceptable.

Expensive assembly

The feed is expensive to build live.
Precompute or partially materialize it.
That attacks computation, not just storage.

Media delivery

Images and video should not always come from origin.
Serve blobs through a CDN.
That solves distance, not ranking logic.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would cache product details that change rarely, but keep inventory or price freshness rules explicit.

We do

Complete the missing piece.

For the catalog prompt, mark which reads can be stale and which ones affect money or trust.

You do

Answer without notes.

Design the read path with one copy layer and one fallback path.

Practice

Try it before you read the model answer.

Prompt

Design a public product catalog with millions of views and relatively few updates.

What is the main read bottleneck likely to be?
Would you use cache, replicas, CDN, or precomputed views?
What trade-off are you accepting?

Show a strong model answer

I would expect the system to be read-heavy, with many users viewing the same product pages far more often than products are updated. My first move would be caching product page data because repeated reads are likely the main pattern. If the application still needs many database reads, I would add read replicas to take pressure off the primary. If product images are a major part of the traffic, I would serve them through a CDN. The main trade-off is better read latency and capacity in exchange for more staleness management and more moving parts.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Name why reads are slow.

Repeated lookup, DB pressure, expensive assembly, or distant content.

Pick the smallest justified fix.

Cache, replicas, precompute, or CDN.

Do not confuse capacity with computation.

Replicas add read capacity. They do not make bad queries cheap.

Reusable interview line

"I would first name why the read path is slow, then choose the smallest justified move: cache for repeated hot reads, replicas for database read pressure, precomputed views for expensive assembly, and CDN for global blob delivery."