← Articulet System Design, Made Clear Chapter 5 · Databases Made Simple
Part 2 · Core Building Blocks Chapter 5

Databases, made simple.

How to choose a database by access pattern — not by hype.

Learning objective
Choose a sensible primary database in interviews by starting from the system's reads, writes, data shape, and correctness needs — without sounding like a tech-stack catalog.
Before you read

Make a prediction first.

Predict

Answer before the explanation.

What should you know before choosing SQL, NoSQL, graph, or search?

Commit

Write a rough answer.

Write the primary read query and primary write operation before naming any database.

Connect

Notice where it returns.

Database choice returns in URL mapping, messages, feeds, counters, videos, and notifications.

Concrete first

"What database would you use?"

It's the most common follow-up in any system design interview, and the place where candidates most often reach for a brand name before they've stated the workload. The interviewer hears "Cassandra" and silently asks: okay, but why?

The clean way to think about databases is to flip the question. Don't start by asking which database. Start by asking what question does this system ask most often? — and let the answer pick the storage shape.

Mental model

Pick the shelf by the question.

Imagine four kinds of storage. Each one is good at one kind of question.
Ledger precise, related records Relational Locker quick lookup by key Key-value { field: ... } Folder flexible self-contained objects Document Warehouse aisle very large, partitioned, write-heavy Wide-column
The metaphor is the brain's hook. The product names live in the corner, on purpose — they come after the shape.
First principles

A database is a tool optimized for certain questions and update patterns.

Once you internalize that sentence, the four shapes stop being trivia and start being decisions you can derive from the workload. Before naming any database, ask:

In real systems, you may use more than one store. But in interviews, start with the primary source of truth. Add stores only when scale or query shape forces the move.

The decision tree

One question. Four branches. Pick the shape that fits.

What question does this system ask most often? "Joins, transactions, strict correctness?" → Relational "Mostly lookup by one key?" → Key-value "Each entity = one flexible object?" → Document "Massive partitioned writes at scale?" → Wide-column Start with the dominant access pattern. Evolve only when the bottleneck demands it.
The tree is shallow on purpose. If the answer doesn't fall out in 30 seconds, the workload isn't clear yet.
Why it matters in interviews

Database choice is where first-principles thinking is most visible.

Weak
We can use NoSQL for scale.
Strong
The main read path is lookup by short code, so a key-based model is a natural fit. If we also need strong relational workflows like billing or ownership management, I may keep that metadata in a relational store.

The strong version ties the database to access pattern, consistency needs, and evolution path. That's what gets the head-nod.

Key ideas

Six things to anchor on.

Speaking script

Lines that show you're choosing, not guessing.

Opening
I want to choose the primary store based on the dominant read and write pattern, not on scale buzzwords.
Key-value
If this system mostly asks "given this ID, return the record," a key-value model is a strong fit.
Relational
If I need transactions across related entities, a relational store is the cleaner starting point.
Document
If each record is naturally a flexible object I read or write as a whole, a document model may fit well.
Restraint
I only want wide-column when partition scale, write volume, or data layout actually justify the complexity.
Defending
I may use more than one store later, but the first version should stay grounded in the main workload.
Common mistakes

How candidates lose points on the database question.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Database choice starts with SQL versus NoSQL.

Better model

What to replace it with

Database choice starts with access pattern, consistency need, relationships, and growth shape.

Interview move

What to do in the room

State the query shape first: lookup by key, relationship query, append-and-scan, or search.

Trade-offs

The four shapes, side by side.

Database shapeGood whenWeak whenInterview line
Relational Default Data is structured, related, and needs transactional correctness. You're forcing massive simple key lookups or huge write-heavy partitions into a shape that doesn't fit. I'd start relational because correctness across related records matters more here than maximal write scale.
Key-value Reads are mostly by one key and the value behind it is simple to fetch. You need rich querying, joins, or many ad hoc access patterns. The main operation is lookup by key, so a key-value model keeps the hot path simple.
Document Each entity is a flexible object and the app reads or writes the full object. You need strong relational joins or normalized updates across many entities. A document model fits because the record naturally travels as one object with flexible fields.
Wide-column Data is very large, partitioned, and often write-heavy or time-ordered. The system is small or the query pattern is relational and correctness-heavy. I'd reach for wide-column only when partition scale and write throughput dominate.
Mini case study

URL shortener — what database, and why.

The dominant question

  • "Given a short code, return the long URL."
  • That's a classic key-based lookup.

So the first-pass model is

  • Key: short code
  • Value: long URL plus a small amount of metadata
  • Hot path stays a single fast lookup.

What could change the design later

  • Authenticated link management → relational metadata for ownership, billing, admin workflows.
  • High-scale analytics events → a separate write-optimized stream or store.
  • Search and filtering over links → key-value alone won't be enough.

The portable lesson

  • Choose the database for the main question first.
  • Layer in supporting stores only when a side question becomes a real bottleneck.
Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "For orders and payments, correctness and relationships matter, so I would start with a relational model."

We do

Complete the missing piece.

For the store prompt, name one table-like relationship and one lookup that must be fast.

You do

Answer without notes.

Pick a database only after writing the two most important queries.

Practice

Try it before you read the model answer.

Prompt
"Design a simple order and payment system for an online store."
  • Which primary database shape would you choose first, and why?
Show a strong model answer
I'd start with a relational database because orders, payments, inventory, and users are related entities, and correctness matters more than raw write scale in the first version. I expect transactions and clear data relationships to matter here. If later parts of the system become high-scale or need specialized access patterns, I can add supporting stores, but I'd keep the source of truth relational first.
Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Memory hook
Question first. Database second.
Recap

Three things to take into the room.

1

Say the reads. Say the writes. Then choose the store.

Workload first; product name last (or never).

2

Four shapes are enough.

Ledger · Locker · Folder · Warehouse aisle. The metaphor anchors the recall.

3

Source of truth is one thing.

Caches, indexes, and analytics stores are extra — keep them off the first pass.

Reusable interview line
"Let me say the dominant reads and writes first — the database shape will follow from that, not from a brand."