Part 2 · Core Building Blocks Chapter 5

Databases, made simple.

How to choose a database by access pattern — not by hype.

Learning objective

Choose a sensible primary database in interviews by starting from the system's reads, writes, data shape, and correctness needs — without sounding like a tech-stack catalog.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

What should you know before choosing SQL, NoSQL, graph, or search?

Commit

Write a rough answer.

Write the primary read query and primary write operation before naming any database.

Connect

Notice where it returns.

Database choice returns in URL mapping, messages, feeds, counters, videos, and notifications.

Concrete first

"What database would you use?"

It's the most common follow-up in any system design interview, and the place where candidates most often reach for a brand name before they've stated the workload. The interviewer hears "Cassandra" and silently asks: okay, but why?

The clean way to think about databases is to flip the question. Don't start by asking which database. Start by asking what question does this system ask most often? — and let the answer pick the storage shape.

Mental model

Pick the shelf by the question.

Imagine four kinds of storage. Each one is good at one kind of question.

The metaphor is the brain's hook. The product names live in the corner, on purpose — they come after the shape.

First principles

A database is a tool optimized for certain questions and update patterns.

Once you internalize that sentence, the four shapes stop being trivia and start being decisions you can derive from the workload. Before naming any database, ask:

What do we read most often?
What do we write most often?
Do we mostly look things up by one key?
Do we need transactions across related records?
Is each record flexible like a document, or tightly structured like a table?
Is the system small and correctness-heavy, or huge and write-heavy?

In real systems, you may use more than one store. But in interviews, start with the primary source of truth. Add stores only when scale or query shape forces the move.

Why it matters in interviews

Database choice is where first-principles thinking is most visible.

Weak

We can use NoSQL for scale.

Strong

The main read path is lookup by short code, so a key-based model is a natural fit. If we also need strong relational workflows like billing or ownership management, I may keep that metadata in a relational store.

The strong version ties the database to access pattern, consistency needs, and evolution path. That's what gets the head-nod.

Key ideas

Six things to anchor on.

Start with access patterns, not product names.
The primary database should match the dominant read and write path.
Strong consistency and relationships often point toward relational storage.
Simple key lookups often point toward key-value storage.
Large-scale flexible or write-heavy systems may justify document or wide-column models.
One system can use multiple stores, but the first-pass interview design should stay simple.

Speaking script

Lines that show you're choosing, not guessing.

Opening

I want to choose the primary store based on the dominant read and write pattern, not on scale buzzwords.

Key-value

If this system mostly asks "given this ID, return the record," a key-value model is a strong fit.

Relational

If I need transactions across related entities, a relational store is the cleaner starting point.

Document

If each record is naturally a flexible object I read or write as a whole, a document model may fit well.

Restraint

I only want wide-column when partition scale, write volume, or data layout actually justify the complexity.

Defending

I may use more than one store later, but the first version should stay grounded in the main workload.

Common mistakes

How candidates lose points on the database question.

Saying "NoSQL scales better" without naming the workload.
Picking a database before stating the dominant queries.
Adding several databases in the first version just to sound advanced.
Confusing the source of truth with a cache, search index, or analytics pipeline.
Ignoring transactions and consistency requirements.
Treating "flexible schema" as an automatic reason to avoid relational storage.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Database choice starts with SQL versus NoSQL.

Better model

What to replace it with

Database choice starts with access pattern, consistency need, relationships, and growth shape.

Interview move

What to do in the room

State the query shape first: lookup by key, relationship query, append-and-scan, or search.

Trade-offs

The four shapes, side by side.

Database shape	Good when	Weak when	Interview line
Relational Default	Data is structured, related, and needs transactional correctness.	You're forcing massive simple key lookups or huge write-heavy partitions into a shape that doesn't fit.	I'd start relational because correctness across related records matters more here than maximal write scale.
Key-value	Reads are mostly by one key and the value behind it is simple to fetch.	You need rich querying, joins, or many ad hoc access patterns.	The main operation is lookup by key, so a key-value model keeps the hot path simple.
Document	Each entity is a flexible object and the app reads or writes the full object.	You need strong relational joins or normalized updates across many entities.	A document model fits because the record naturally travels as one object with flexible fields.
Wide-column	Data is very large, partitioned, and often write-heavy or time-ordered.	The system is small or the query pattern is relational and correctness-heavy.	I'd reach for wide-column only when partition scale and write throughput dominate.

Mini case study

URL shortener — what database, and why.

The dominant question

"Given a short code, return the long URL."
That's a classic key-based lookup.

So the first-pass model is

Key: short code
Value: long URL plus a small amount of metadata
Hot path stays a single fast lookup.

What could change the design later

Authenticated link management → relational metadata for ownership, billing, admin workflows.
High-scale analytics events → a separate write-optimized stream or store.
Search and filtering over links → key-value alone won't be enough.

The portable lesson

Choose the database for the main question first.
Layer in supporting stores only when a side question becomes a real bottleneck.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "For orders and payments, correctness and relationships matter, so I would start with a relational model."

We do

Complete the missing piece.

For the store prompt, name one table-like relationship and one lookup that must be fast.

You do

Answer without notes.

Pick a database only after writing the two most important queries.

Practice

Try it before you read the model answer.

Prompt

"Design a simple order and payment system for an online store."

Which primary database shape would you choose first, and why?

Show a strong model answer

I'd start with a relational database because orders, payments, inventory, and users are related entities, and correctness matters more than raw write scale in the first version. I expect transactions and clear data relationships to matter here. If later parts of the system become high-scale or need specialized access patterns, I can add supporting stores, but I'd keep the source of truth relational first.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Say the reads. Say the writes. Then choose the store.

Workload first; product name last (or never).

Four shapes are enough.

Ledger · Locker · Folder · Warehouse aisle. The metaphor anchors the recall.

Source of truth is one thing.

Caches, indexes, and analytics stores are extra — keep them off the first pass.

Reusable interview line

"Let me say the dominant reads and writes first — the database shape will follow from that, not from a brand."