Part 3 · Designing at Scale Chapter 11

Scaling writes.

How to accept more writes without losing control of correctness.

Learning objective

Identify the real write bottleneck and choose the right move such as queueing, batching, partitioning, or safer ID generation without losing sight of contention and correctness.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

What breaks first when writes grow: the database CPU, hot keys, ordering, durability, or downstream consumers?

Commit

Write a rough answer.

Before reading, name the exact write being accepted and the promise made after acceptance.

Connect

Notice where it returns.

Write scaling returns in analytics, messages, notifications, feed fan-out, and video processing.

Concrete first

Writes are harder because they change shared truth.

A read asks for a copy of the truth. A write tries to change the truth. That difference is why write scaling usually hurts earlier and more sharply than people expect.

When too many writes hit the same path, the problem is usually not just volume. The real issue is contention, coordination, burstiness, or a hidden central bottleneck like one hot counter or one overloaded partition.

First principles

Four common reasons writes back up.

Too many writers are updating the same record, key, or partition.
Each write needs coordination before it can be accepted.
Writes arrive in bursts faster than storage can absorb them.
Every writer depends on one central service such as a counter or ID allocator.

Why it matters in interviews

Write-scaling answers should sound contention-aware, not slogan-driven.

Weak

We can shard the database.

Strong

If writes are spiky, I would first buffer them with a queue. If one partition is getting too many writes, I would spread ownership across partitions. If every server is generating IDs, I need a scheme that guarantees uniqueness without a single bottleneck.

The stronger answer connects the kind of pressure to the smallest justified fix.

Key ideas

Eight anchors.

Writes are harder because they change shared state.
Write bottlenecks often come from contention, coordination, or burstiness.
Queues help absorb spikes when writes do not all need immediate completion.
Batching helps when many small writes can be grouped efficiently.
Partitioning helps when writes can be spread by key or ownership.
Hotspots appear when too many writes land on one key, one partition, or one global counter.
ID generation matters because uniqueness schemes can quietly become write bottlenecks.
The best answer keeps correctness visible while reducing contention.

Speaking script

Lines for the write-scaling conversation.

Opening

I want to scale writes by identifying where contention is happening, not by assuming every system needs early sharding.

Sketching

If writes arrive in bursts, my first move may be to buffer them with a queue so the storage layer can absorb them steadily.

Deep dive

If many small writes can be combined safely, batching reduces overhead. If one partition is overloaded, I need to spread writes by a better partition key.

Trade-off

The gain is higher write capacity. The cost is usually more complexity, more eventual consistency, or harder debugging.

Extending

If every server must create unique IDs, I need a scheme that avoids collisions without turning one allocator into the bottleneck.

Defending

I would rather fix the specific contention point first than jump directly to full sharding.

Common mistakes

How candidates make write scaling sound easier than it is.

Jumping straight to sharding without identifying the write bottleneck.
Assuming more app servers automatically solve write contention.
Forgetting that one hot key or one global counter can dominate the whole system.
Using a queue for work that truly must commit before the user gets a response.
Ignoring batching even when the system writes many small records continuously.
Designing ID generation as a single central bottleneck.
Scaling write throughput while ignoring correctness, deduplication, or ordering requirements.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Scaling writes mainly means sharding the database.

Better model

What to replace it with

Write scaling means accepting writes safely, partitioning pressure, buffering bursts, and preserving the correctness promise.

Interview move

What to do in the room

Separate acceptance from processing, then decide what must be durable before returning success.

Trade-offs

Five write-scaling choices.

Write-scaling choice	Good when	Weak when	Interview line
Keep a simple single write path	Write volume is still manageable and correctness is easier with one primary path.	The write path is saturated or one node is the obvious bottleneck.	I would keep the write path simple until contention or saturation actually shows up.
Queue and absorb bursts Default	Writes arrive unevenly and some completion can happen shortly after the request.	The user must know the final durable result immediately for every step.	A queue helps if the main problem is bursty write load rather than strict immediate completion.
Batch writes	Many small writes can be grouped without harming the product requirement.	Each write must be visible or durable immediately and independently.	Batching helps when the overhead per write is the problem and slight delay is acceptable.
Partition or shard writes	Writes can be spread naturally by key, tenant, region, or ownership.	The partition key creates hotspots or queries become much harder than before.	Partitioning helps only if the chosen key actually spreads write ownership well.
Coordinated or distributed ID generation	Many servers create records concurrently and uniqueness must hold.	One central allocator becomes the write bottleneck or collision handling is weak.	I need an ID strategy that guarantees uniqueness without forcing every write through one hot counter.

Mini case study

URL shortener — the write path is small, but uniqueness is not free.

Core write

Client submits a long URL.
Service creates a short code.
Service stores the mapping.

Where it gets hard

Multiple servers create codes concurrently.
Collisions must not happen.
One global allocator can become hot.

Clean progression

Small system: one generator is fine.
Larger system: give writers safe ID ranges or machine-aware IDs.
Hot tenants: partition ownership better.

Separate side writes

Analytics writes can often be queued or batched.
The mapping write needs strict uniqueness immediately.
Do not mix those two lanes casually.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would accept click events quickly into a durable log, then process aggregates asynchronously.

We do

Complete the missing piece.

For ad clicks, identify the hot partition risk and one way to spread writes.

You do

Answer without notes.

Answer the practice prompt with acceptance, buffering, processing, and failure handling as separate steps.

Practice

Try it before you read the model answer.

Prompt

Design a system that records large volumes of click events from ads.

What is the likely write bottleneck?
Would you batch, queue, partition, or change ID generation?
What trade-off are you accepting?

Show a strong model answer

I would expect bursty high-volume writes, so my first concern would be protecting the storage layer from direct spikes. I would likely use a queue to absorb bursts and workers to write steadily downstream. If many small events are being persisted, batching would reduce per-write overhead. I would partition by a key that spreads traffic well, such as time plus source or tenant, so one hot partition does not take all writes. The main trade-off is better write throughput and resilience versus more delayed visibility and more operational complexity.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Name what writers are fighting over.

A hot key, a hot partition, a bursty path, or a central allocator.

Reduce contention before redesigning everything.

Queue, batch, repartition, or fix ID generation.

Separate critical writes from side-effect writes.

They rarely deserve the same latency and correctness contract.

Reusable interview line

"I would first name where the write contention lives, then choose the smallest justified move: queue bursts, batch tiny writes, repartition hotspots, or change the ID strategy so one allocator does not control every write."