← Articulet System Design, Made Clear Chapter 9 · Back-of-the-Envelope Estimation
Part 3 · Designing at Scale Chapter 9

Back-of-the-envelope estimation.

Enough math to guide the design, not enough to derail it.

Learning objective
Turn rough product assumptions into useful signals such as QPS, storage growth, bandwidth, and peak load, then say clearly what those numbers imply for architecture.
Before you read

Make a prediction first.

Predict

Answer before the explanation.

Which number would change an image-sharing design first: users, writes, reads, storage, or peak traffic?

Commit

Write a rough answer.

Before reading, guess the system shape before calculating. Then check whether the numbers support it.

Connect

Notice where it returns.

Every later scaling chapter depends on estimates that guide cache, shard, queue, and storage choices.

Concrete first

The point is not arithmetic. The point is direction.

You usually do not need exact numbers in a system design interview. You need enough numeric shape to know whether you are building a desk lamp, a warehouse, or a stadium.

Good estimates answer questions like: is this hundreds of requests per second or hundreds of thousands? Is storage in gigabytes, terabytes, or petabytes? Is the system read-heavy or write-heavy? The goal is not to be exact. The goal is to make the next design decision defensible.

Mental model

Rough math is a flashlight, not an audit.

Use the math to reveal the shape of the problem, not to pretend your assumptions are precise.
Flashlight Small room stay simple Large room add structure Warehouse the design must change order of magnitude is enough to tell you which room you are in
If the flashlight already shows a small room, do not design a stadium. If it shows a warehouse, stop pretending one server will be enough.
First principles

Estimate only what changes the design.

Decision flow

Assumptions → traffic → storage → implications.

Assumptions DAU actions / user object size retention Traffic read QPS write QPS peak factor avg ≠ peak Storage bytes / object objects / day growth / year Implications cache? shard? CDN? async? say what the numbers mean
If the flow ends at numbers instead of implications, the estimation did not do its job.
Why it matters in interviews

Ground the design before you optimize it.

Weak
This system probably needs caching and sharding.
Strong
This looks read-heavy, roughly 10× more reads than writes, and peak reads may land in the low tens of thousands per second. That makes caching high leverage, while writes may still be manageable without early sharding.

The strong version ties assumptions to numbers and numbers to architecture. That is the whole point of the chapter.

Average vs peak

Average load hides the real pressure.

Average reads 1.2k QPS Peak reads 11.6k QPS designing for the average is often designing for the wrong moment
Averages describe a calm day. Architecture usually breaks on the busy one.
Key ideas

Seven anchors.

Speaking script

Lines for the estimation conversation.

Opening
I only want rough numbers that change architecture decisions.
Sketching
I will start with users, key actions per user, and average object size, then convert that into average and peak load.
Size
The important takeaway is not the exact number. It is whether this is small enough to stay simple or large enough to justify more infrastructure.
Trade-off
This estimate suggests the system is read-heavy, so caching matters more than early write sharding.
Defending
I would rather round hard and keep moving than spend five minutes pretending the inputs are exact.
Recovery
If one assumption is unknown, I will state a reasonable range, pick a midpoint, and keep the design conversation moving.
Common mistakes

How candidates turn useful math into dead weight.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Estimation is about exact math and impressive precision.

Better model

What to replace it with

Estimation is about order of magnitude and design direction.

Interview move

What to do in the room

Convert each number into a decision: cache, CDN, partition, async processing, or keep simple.

Trade-offs

Four estimation styles.

Estimation styleGood whenWeak whenInterview line
Skip estimation entirely The problem is tiny and the numbers clearly do not matter much. Architecture depends on scale and you are guessing blindly. I want at least rough order-of-magnitude estimates so the design stays grounded.
Rough order-of-magnitude estimation Default You need fast guidance for architectural choices. You stop before turning the numbers into implications. I only need enough math to decide whether caching, CDN, or sharding are justified.
Highly precise arithmetic Exact capacity planning is the actual task. The interview only needs direction and you burn time on detail. I would rather round hard and keep moving than spend five minutes pretending the inputs are exact.
Average-only estimation Traffic is stable and burstiness is low. Peak traffic drives the real bottleneck. I need a peak estimate too, because average load hides the actual pressure points.
Mini case study

URL shortener — just enough numbers to guide the design.

Assumptions

  • 10M daily active users.
  • 2 new short URLs per user per day.
  • 100M redirects per day.
  • 500 bytes per stored record.

Writes

  • 20M new URLs / day.
  • ~230 writes / sec average.
  • ~2.3k writes / sec at peak.

Reads

  • 100M redirects / day.
  • ~1.16k reads / sec average.
  • ~11.6k reads / sec at peak.

Storage implication

  • ~10 GB / day of new records.
  • ~3.6 TB / year before replication overhead.
  • Read-heavy path suggests caching first.
Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "If reads are far larger than writes, I will optimize the read path before complicating the write path."

We do

Complete the missing piece.

For image sharing, estimate daily writes and daily reads, then write one architecture implication.

You do

Answer without notes.

Use rough numbers only. Stop when the next design decision is clear.

Practice

Try it before you read the model answer.

Prompt
Design a simple image-sharing service.
  • Estimate uploads per day.
  • Estimate image read QPS.
  • Estimate storage growth per year.
  • Name one design implication from each estimate.
Show a strong model answer
I would start with rough assumptions such as daily active users, uploads per user, reads per image, and average image size. From that I would estimate average and peak upload QPS, average and peak image read QPS, and yearly storage growth. If image reads are much higher than writes, that points toward caching and likely CDN use. If yearly storage is large but manageable, I would keep metadata in a database and image bytes in object storage without prematurely complicating the design.
Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Memory hook
Round hard. Reason clearly.
Recap

Three things to take into the room.

1

Estimate what matters.

QPS, storage, bandwidth, peak load. Not decorative arithmetic.

2

Average is not enough.

Peak traffic often decides what breaks first.

3

Say what the numbers imply.

That is where the engineering judgment shows up.

Reusable interview line
"I only need rough order-of-magnitude numbers here. The real goal is to decide whether the design stays simple or whether scale justifies extra infrastructure."