Part 3 · Designing at Scale Chapter 9

Back-of-the-envelope estimation.

Enough math to guide the design, not enough to derail it.

Learning objective

Turn rough product assumptions into useful signals such as QPS, storage growth, bandwidth, and peak load, then say clearly what those numbers imply for architecture.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

Which number would change an image-sharing design first: users, writes, reads, storage, or peak traffic?

Commit

Write a rough answer.

Before reading, guess the system shape before calculating. Then check whether the numbers support it.

Connect

Notice where it returns.

Every later scaling chapter depends on estimates that guide cache, shard, queue, and storage choices.

Concrete first

The point is not arithmetic. The point is direction.

You usually do not need exact numbers in a system design interview. You need enough numeric shape to know whether you are building a desk lamp, a warehouse, or a stadium.

Good estimates answer questions like: is this hundreds of requests per second or hundreds of thousands? Is storage in gigabytes, terabytes, or petabytes? Is the system read-heavy or write-heavy? The goal is not to be exact. The goal is to make the next design decision defensible.

Mental model

Rough math is a flashlight, not an audit.

Use the math to reveal the shape of the problem, not to pretend your assumptions are precise.

If the flashlight already shows a small room, do not design a stadium. If it shows a warehouse, stop pretending one server will be enough.

First principles

Estimate only what changes the design.

Start with users, actions per user, object size, and retention.
Convert daily volume into per-second traffic when reasoning about load.
Separate reads from writes because they usually scale differently.
Apply a peak factor because real systems are not flat averages.
After the math, say the implication out loud: cache, shard, CDN, async, or keep it simple.

Why it matters in interviews

Ground the design before you optimize it.

Weak

This system probably needs caching and sharding.

Strong

This looks read-heavy, roughly 10× more reads than writes, and peak reads may land in the low tens of thousands per second. That makes caching high leverage, while writes may still be manageable without early sharding.

The strong version ties assumptions to numbers and numbers to architecture. That is the whole point of the chapter.

Key ideas

Seven anchors.

Estimate only the numbers that change the design.
Order of magnitude is usually more useful than fake precision.
Start with users, key actions per user, average object size, and retention.
Convert daily numbers into per-second numbers when reasoning about traffic.
Apply a peak factor because real systems do not run at flat average load.
Separate reads from writes.
After estimating, state the architectural implication out loud.

Speaking script

Lines for the estimation conversation.

Opening

I only want rough numbers that change architecture decisions.

Sketching

I will start with users, key actions per user, and average object size, then convert that into average and peak load.

Size

The important takeaway is not the exact number. It is whether this is small enough to stay simple or large enough to justify more infrastructure.

Trade-off

This estimate suggests the system is read-heavy, so caching matters more than early write sharding.

Defending

I would rather round hard and keep moving than spend five minutes pretending the inputs are exact.

Recovery

If one assumption is unknown, I will state a reasonable range, pick a midpoint, and keep the design conversation moving.

Common mistakes

How candidates turn useful math into dead weight.

Spending too long on arithmetic that does not affect the design.
Using overly precise numbers that suggest false confidence.
Forgetting to estimate peak load.
Mixing up reads and writes.
Calculating storage without retention.
Presenting numbers without explaining what they mean architecturally.
Freezing because one assumption is unknown instead of stating a reasonable assumption and moving on.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Estimation is about exact math and impressive precision.

Better model

What to replace it with

Estimation is about order of magnitude and design direction.

Interview move

What to do in the room

Convert each number into a decision: cache, CDN, partition, async processing, or keep simple.

Trade-offs

Four estimation styles.

Estimation style	Good when	Weak when	Interview line
Skip estimation entirely	The problem is tiny and the numbers clearly do not matter much.	Architecture depends on scale and you are guessing blindly.	I want at least rough order-of-magnitude estimates so the design stays grounded.
Rough order-of-magnitude estimation Default	You need fast guidance for architectural choices.	You stop before turning the numbers into implications.	I only need enough math to decide whether caching, CDN, or sharding are justified.
Highly precise arithmetic	Exact capacity planning is the actual task.	The interview only needs direction and you burn time on detail.	I would rather round hard and keep moving than spend five minutes pretending the inputs are exact.
Average-only estimation	Traffic is stable and burstiness is low.	Peak traffic drives the real bottleneck.	I need a peak estimate too, because average load hides the actual pressure points.

Mini case study

URL shortener — just enough numbers to guide the design.

Assumptions

10M daily active users.
2 new short URLs per user per day.
100M redirects per day.
500 bytes per stored record.

Writes

20M new URLs / day.
~230 writes / sec average.
~2.3k writes / sec at peak.

Reads

100M redirects / day.
~1.16k reads / sec average.
~11.6k reads / sec at peak.

Storage implication

~10 GB / day of new records.
~3.6 TB / year before replication overhead.
Read-heavy path suggests caching first.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "If reads are far larger than writes, I will optimize the read path before complicating the write path."

We do

Complete the missing piece.

For image sharing, estimate daily writes and daily reads, then write one architecture implication.

You do

Answer without notes.

Use rough numbers only. Stop when the next design decision is clear.

Practice

Try it before you read the model answer.

Prompt

Design a simple image-sharing service.

Estimate uploads per day.
Estimate image read QPS.
Estimate storage growth per year.
Name one design implication from each estimate.

Show a strong model answer

I would start with rough assumptions such as daily active users, uploads per user, reads per image, and average image size. From that I would estimate average and peak upload QPS, average and peak image read QPS, and yearly storage growth. If image reads are much higher than writes, that points toward caching and likely CDN use. If yearly storage is large but manageable, I would keep metadata in a database and image bytes in object storage without prematurely complicating the design.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Estimate what matters.

QPS, storage, bandwidth, peak load. Not decorative arithmetic.

Average is not enough.

Peak traffic often decides what breaks first.

Say what the numbers imply.

That is where the engineering judgment shows up.

Reusable interview line

"I only need rough order-of-magnitude numbers here. The real goal is to decide whether the design stays simple or whether scale justifies extra infrastructure."