Part 4 · Real Interview Systems Chapter 17

The rate limiter.

Deciding who gets through, how often, and with what fairness.

Learning objective

Design a rate limiter by clarifying what is being limited, where the limiter sits, which algorithm fits, and how to reason about distributed counters and fairness in an interview.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

What does a rate limiter have to be fair about: user, API key, IP, region, or endpoint?

Commit

Write a rough answer.

Before reading, choose the key, window, and failure behavior for a public API.

Connect

Notice where it returns.

Rate limiting interleaves gateway behavior, distributed counters, consistency, latency, and abuse control.

Plain English

The system asks one question: should this request be allowed right now?

A rate limiter protects systems from overload and abuse by deciding whether the next request should pass or be rejected. The core idea is small, but the details matter: who is being limited, how the rule works, where the limiter sits, and whether the state must work across many servers.

The clean first version is usually simple: define the limiting key, define the rule, keep current usage in a fast store, then allow or reject quickly. That is enough to make the chapter practical without getting lost in low-value implementation detail.

Reasonable v1 scope

Define the key being limited.
Define the limit window or token rule.
Store current usage in a fast store.
Allow or reject quickly.

Clarifications that matter

User, IP, API key, or tenant?
Per second, per minute, burst plus sustained rate?
Gateway, service edge, or app code?
Single instance or many servers?

The most useful simplification is this: before choosing an algorithm, define who you are limiting.

Why it matters in interviews

This problem exposes whether you start from product behavior or from tool habit.

Interviewers like rate limiter because it tests whether the candidate can identify the right limiting key, pick a reasonable algorithm, talk about distributed state, and discuss correctness versus simplicity without getting abstract.

Weak opener

Use Redis and count requests.

Strong opener

I first need to clarify whether the limit is per API key, per user, or per IP, because that changes the state model. If I want to allow short bursts but cap sustained rate, token bucket is a good fit. If the limiter runs across many application servers, the counter state needs a shared fast store.

The stronger answer starts with the rule and the identity, then chooses the mechanism.

Key ideas

Six anchors.

The most important clarification is what key you are limiting.
Limits may be strict windows, burst-friendly token models, or smoothed sliding models.
The limiter should sit early enough to protect the expensive downstream systems.
A distributed limiter usually needs shared state across many application instances.
Strict correctness costs more coordination and latency.
Small inaccuracy is often acceptable for much simpler operation.

Speaking script

Lines you can actually say out loud.

Opening

The first thing I want to clarify is what entity is being limited: user, IP, API key, tenant, or something else.

Sketching

I want the limiter early in the request path so expensive downstream systems are protected.

Deep dive

If short bursts are acceptable but sustained abuse is not, a token bucket style model is a good fit.

Deep dive

If the limiter runs across multiple app servers, the limit state needs a shared fast store or another coordination strategy.

Defending

The trade-off is stricter fairness versus more coordination, more latency, and more complexity.

Defending

For many systems, slight inaccuracy is acceptable if the limiter still protects the service effectively.

Common mistakes

Predictable ways this answer goes wrong.

Picking an algorithm before clarifying what key is being limited.
Putting the limiter too late in the stack to protect the expensive path.
Assuming local in-memory counters are enough when many servers handle traffic.
Treating fairness and strict correctness as free.
Ignoring burst behavior and designing only for flat average rates.
Forgetting to define what the client sees when the limit is exceeded.
Mixing authentication policy and rate-limiting policy without saying which is which.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

A rate limiter is just a Redis counter.

Better model

What to replace it with

A rate limiter is a policy plus algorithm plus distributed state problem, with fairness and failure behavior explicitly chosen.

Interview move

What to do in the room

State the limit key, algorithm, storage location, and fallback when the limiter store is slow.

Trade-offs

The decisions that come up every time.

Limiter choice	Good when	Weak when	Interview line
Local per-instance limiter	The system is small and rough protection is enough.	Traffic is spread across many servers and fairness matters across the fleet.	A local limiter is simple, but it is only an approximation once traffic hits multiple servers.
Shared distributed limiter Default	Many servers handle requests and you want a more consistent global view.	The added network hop and shared state are not justified yet.	A shared limiter makes sense when distributed fairness matters more than absolute simplicity.
Fixed window counter	Simplicity matters most and boundary effects are acceptable.	Burstiness at window edges makes the limit too easy to game.	Fixed window is simple, but it is the least fair around boundaries.
Token bucket Default	Short bursts are acceptable but sustained abuse should be capped.	You need a different fairness model and burst allowance is undesirable.	Token bucket is a good fit when I want to allow bursts while controlling the sustained rate.
Sliding window style approach	Fairer smoothing matters more than minimal implementation complexity.	Simplicity is more important than tighter accuracy.	A sliding model is more accurate, but I would only pay that cost if fairness really matters.

Deep dive

One limit across many servers means fairness becomes a distributed-state problem.

This is where the chapter becomes interesting. The limiter itself is simple. The hard part is making many doors act like one bouncer.

Local counters are cheap and approximate. Shared counters are fairer and more coordinated. The right choice depends on how much consistency the product actually needs.

Mini case study

Public login API with abuse risk.

This is the useful product-fairness test. The algorithm matters, but the real question is whose pain you are willing to create.

What key probably matters

IP for basic abuse protection.
Possibly account or device for more product-aware limits.

What makes it tricky

A strict per-account limiter can hurt real users who mistype passwords.
A strict per-IP limiter can hurt many users behind one shared network.

What helps

Separate policies for different keys.
Short burst tolerance with a stronger sustained cap.
Putting the limiter at the edge before the expensive auth path.

The lesson

Rate limiting is not only an algorithm choice. It is also a product fairness choice.

Demo conversation

How a strong exchange sounds.

Interviewer

What do you need to clarify first in a rate limiter question?

Candidate

What identity is being limited and what kind of fairness matters. Per user, per IP, and per API key create different state shapes and abuse patterns.

Interviewer

Why are local counters alone not enough?

Candidate

Because requests may hit different app servers. Purely local state is cheap, but it can become unfair or inconsistent unless the traffic is sticky in a way the prompt explicitly allows.

Interviewer

So what trade-off are you really making?

Candidate

Fairness and correctness versus coordination cost. Stronger shared-state accuracy usually means more latency or more infrastructure on the critical path.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "I will limit by API key with a token bucket so short bursts are allowed but sustained abuse is blocked."

We do

Complete the missing piece.

For a public API, compare fixed window and token bucket in one sentence each.

You do

Answer without notes.

Answer the practice prompt and include what happens when the counter store is unavailable.

Practice

Try it before you read the model answer.

Prompt

Design a rate limiter for a public API used by many customers.

What key or keys would you limit on?
Where would the limiter sit?
What trade-off would you accept?

Show a strong model answer

I would likely limit on API key because that maps well to customer ownership, and I might also keep a secondary IP-based rule for abuse protection. I would place the limiter at the gateway or service edge so expensive application work is rejected early. If customers need short bursts but should not sustain very high traffic, a token bucket style limiter is a good fit. The main trade-off is better protection and fairness versus the added complexity of distributed counter state.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

First define who you are limiting.

The limiting key shapes everything else.

Protect the expensive path early.

A limiter that sits too late is less useful than it sounds.

Fairness and correctness are not free.

Distributed consistency always comes with a price.

Reusable interview line

"I would first define who is being limited, then place the limiter early in the request path, then choose the simplest algorithm and state model that provides enough fairness for the product."

The rate limiter.

Make a prediction first.

Answer before the explanation.

Write a rough answer.

Notice where it returns.

The system asks one question: should this request be allowed right now?

This problem exposes whether you start from product behavior or from tool habit.

A bouncer with a clicker.

Six anchors.

Keep the first diagram to one gate and one counter store.

Lines you can actually say out loud.

Predictable ways this answer goes wrong.

Correct the wrong model before it sticks.

What feels tempting

What to replace it with

What to do in the room

The decisions that come up every time.

One limit across many servers means fairness becomes a distributed-state problem.

Public login API with abuse risk.

What key probably matters

What makes it tricky

What helps

The lesson

How a strong exchange sounds.

Fade the support before the real practice.

Study the model move.

Complete the missing piece.

Answer without notes.

Try it before you read the model answer.

Make this chapter stick.

Recall

Vary

Score

Three things to take into the room.

First define who you are limiting.

Protect the expensive path early.

Fairness and correctness are not free.