← Articulet System Design, Made Clear Chapter 17 · Rate Limiter
Part 4 · Real Interview Systems Chapter 17

The rate limiter.

Deciding who gets through, how often, and with what fairness.

Learning objective
Design a rate limiter by clarifying what is being limited, where the limiter sits, which algorithm fits, and how to reason about distributed counters and fairness in an interview.
Before you read

Make a prediction first.

Predict

Answer before the explanation.

What does a rate limiter have to be fair about: user, API key, IP, region, or endpoint?

Commit

Write a rough answer.

Before reading, choose the key, window, and failure behavior for a public API.

Connect

Notice where it returns.

Rate limiting interleaves gateway behavior, distributed counters, consistency, latency, and abuse control.

Plain English

The system asks one question: should this request be allowed right now?

A rate limiter protects systems from overload and abuse by deciding whether the next request should pass or be rejected. The core idea is small, but the details matter: who is being limited, how the rule works, where the limiter sits, and whether the state must work across many servers.

The clean first version is usually simple: define the limiting key, define the rule, keep current usage in a fast store, then allow or reject quickly. That is enough to make the chapter practical without getting lost in low-value implementation detail.

Reasonable v1 scope
  • Define the key being limited.
  • Define the limit window or token rule.
  • Store current usage in a fast store.
  • Allow or reject quickly.
Clarifications that matter
  • User, IP, API key, or tenant?
  • Per second, per minute, burst plus sustained rate?
  • Gateway, service edge, or app code?
  • Single instance or many servers?

The most useful simplification is this: before choosing an algorithm, define who you are limiting.

Why it matters in interviews

This problem exposes whether you start from product behavior or from tool habit.

Interviewers like rate limiter because it tests whether the candidate can identify the right limiting key, pick a reasonable algorithm, talk about distributed state, and discuss correctness versus simplicity without getting abstract.

Weak opener
Use Redis and count requests.
Strong opener
I first need to clarify whether the limit is per API key, per user, or per IP, because that changes the state model. If I want to allow short bursts but cap sustained rate, token bucket is a good fit. If the limiter runs across many application servers, the counter state needs a shared fast store.

The stronger answer starts with the rule and the identity, then chooses the mechanism.

Mental model

A bouncer with a clicker.

The bouncer needs a rule, an identity, and a current count.
Rule 100 requests / minute what is allowed Identity API key_123 who is being limited Current count 63 / 100 what remains right now
In a distributed system, the bouncer may be standing at many doors at once. That is what makes the state discussion interesting.
Key ideas

Six anchors.

Core diagram

Keep the first diagram to one gate and one counter store.

C Client Rate limiter allow or reject quickly Application expensive path Counter store fast shared state who? how much? how fast?
This is enough to teach the whole idea: key, rule, shared count, then early allow or reject before the expensive downstream path.
Speaking script

Lines you can actually say out loud.

Opening
The first thing I want to clarify is what entity is being limited: user, IP, API key, tenant, or something else.
Sketching
I want the limiter early in the request path so expensive downstream systems are protected.
Deep dive
If short bursts are acceptable but sustained abuse is not, a token bucket style model is a good fit.
Deep dive
If the limiter runs across multiple app servers, the limit state needs a shared fast store or another coordination strategy.
Defending
The trade-off is stricter fairness versus more coordination, more latency, and more complexity.
Defending
For many systems, slight inaccuracy is acceptable if the limiter still protects the service effectively.
Common mistakes

Predictable ways this answer goes wrong.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

A rate limiter is just a Redis counter.

Better model

What to replace it with

A rate limiter is a policy plus algorithm plus distributed state problem, with fairness and failure behavior explicitly chosen.

Interview move

What to do in the room

State the limit key, algorithm, storage location, and fallback when the limiter store is slow.

Trade-offs

The decisions that come up every time.

Limiter choiceGood whenWeak whenInterview line
Local per-instance limiter The system is small and rough protection is enough. Traffic is spread across many servers and fairness matters across the fleet. A local limiter is simple, but it is only an approximation once traffic hits multiple servers.
Shared distributed limiter Default Many servers handle requests and you want a more consistent global view. The added network hop and shared state are not justified yet. A shared limiter makes sense when distributed fairness matters more than absolute simplicity.
Fixed window counter Simplicity matters most and boundary effects are acceptable. Burstiness at window edges makes the limit too easy to game. Fixed window is simple, but it is the least fair around boundaries.
Token bucket Default Short bursts are acceptable but sustained abuse should be capped. You need a different fairness model and burst allowance is undesirable. Token bucket is a good fit when I want to allow bursts while controlling the sustained rate.
Sliding window style approach Fairer smoothing matters more than minimal implementation complexity. Simplicity is more important than tighter accuracy. A sliding model is more accurate, but I would only pay that cost if fairness really matters.
Deep dive

One limit across many servers means fairness becomes a distributed-state problem.

This is where the chapter becomes interesting. The limiter itself is simple. The hard part is making many doors act like one bouncer.

Local counters only simple, but each server sees only part of the truth Limiter A Limiter B Limiter C 12 / 100 41 / 100 28 / 100 Shared distributed state more coordination, more consistent fairness Limiter A Limiter B Limiter C Shared counter store 81 / 100 total
Local counters are cheap and approximate. Shared counters are fairer and more coordinated. The right choice depends on how much consistency the product actually needs.
Mini case study

Public login API with abuse risk.

This is the useful product-fairness test. The algorithm matters, but the real question is whose pain you are willing to create.

What key probably matters

  • IP for basic abuse protection.
  • Possibly account or device for more product-aware limits.

What makes it tricky

  • A strict per-account limiter can hurt real users who mistype passwords.
  • A strict per-IP limiter can hurt many users behind one shared network.

What helps

  • Separate policies for different keys.
  • Short burst tolerance with a stronger sustained cap.
  • Putting the limiter at the edge before the expensive auth path.

The lesson

  • Rate limiting is not only an algorithm choice. It is also a product fairness choice.
Demo conversation

How a strong exchange sounds.

Interviewer
What do you need to clarify first in a rate limiter question?
Candidate
What identity is being limited and what kind of fairness matters. Per user, per IP, and per API key create different state shapes and abuse patterns.
Interviewer
Why are local counters alone not enough?
Candidate
Because requests may hit different app servers. Purely local state is cheap, but it can become unfair or inconsistent unless the traffic is sticky in a way the prompt explicitly allows.
Interviewer
So what trade-off are you really making?
Candidate
Fairness and correctness versus coordination cost. Stronger shared-state accuracy usually means more latency or more infrastructure on the critical path.
Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "I will limit by API key with a token bucket so short bursts are allowed but sustained abuse is blocked."

We do

Complete the missing piece.

For a public API, compare fixed window and token bucket in one sentence each.

You do

Answer without notes.

Answer the practice prompt and include what happens when the counter store is unavailable.

Practice

Try it before you read the model answer.

Prompt
Design a rate limiter for a public API used by many customers.
  • What key or keys would you limit on?
  • Where would the limiter sit?
  • What trade-off would you accept?
Show a strong model answer
I would likely limit on API key because that maps well to customer ownership, and I might also keep a secondary IP-based rule for abuse protection. I would place the limiter at the gateway or service edge so expensive application work is rejected early. If customers need short bursts but should not sustain very high traffic, a token bucket style limiter is a good fit. The main trade-off is better protection and fairness versus the added complexity of distributed counter state.
Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Memory hook
Who? How much? How fast?
Recap

Three things to take into the room.

1

First define who you are limiting.

The limiting key shapes everything else.

2

Protect the expensive path early.

A limiter that sits too late is less useful than it sounds.

3

Fairness and correctness are not free.

Distributed consistency always comes with a price.

Reusable interview line
"I would first define who is being limited, then place the limiter early in the request path, then choose the simplest algorithm and state model that provides enough fairness for the product."