← Articulet System Design, Made Clear Chapter 7 · Storage Systems and CDNs
Part 2 · Core Building Blocks Chapter 7

Storage systems and CDNs.

Where big files live, and how they get closer to users.

Learning objective
Distinguish clearly between metadata storage, object storage, and CDN delivery so you can explain large-file systems without mixing up queryable records, durable blobs, and fast global reads.
Before you read

Make a prediction first.

Predict

Answer before the explanation.

Why are big files usually not stored in the same place as ordinary rows?

Commit

Write a rough answer.

Before reading, split the system into metadata, bytes, and delivery path.

Connect

Notice where it returns.

This split returns in video upload, profile photos, feeds with media, and private content delivery.

Concrete first

Storage is not one thing.

Candidates say "store the file" as if that is one decision. It usually isn't. The facts about a file and the file bytes themselves are different jobs.

In a photo or video product, the system needs to remember things like user ID, upload time, privacy setting, processing status, and object location. Those are structured records. It also needs to hold the image or video bytes, which can be large, durable, and downloaded repeatedly. Those are blobs. They usually should not live in the same place.

Mental model

Label here. Box there. Shelf near the user.

Describe the file with a label, store the bytes in the box, and keep hot copies on a shelf near the user.
Label metadata queryable facts Box object storage durable bytes Shelf CDN edge copy fast repeated reads
The sentence to remember is structural, not vendor-specific: query the label, store the box, serve hot copies from the shelf.
First principles

Different storage jobs come from different access patterns.

Core diagram

One clean content system: metadata in one lane, bytes in another, hot reads at the edge.

UPLOAD LANE READ LANE ORIGIN / CONTROL LANE U App / API Object storage image / video bytes durable origin Metadata DB owner, status, path R CDN edge cached copies near users hit returns here miss goes to origin WHAT EACH LAYER HOLDS Metadata DB title, owner, ACL, upload time Object storage original bytes, thumbnails, variants CDN temporary edge copies
Uploads write durable origin data and metadata. Reads should hit the edge first and fall back to origin only when needed.
Why it matters in interviews

Interviewers are listening for separation of responsibility.

Weak
We can store images in the database and add a CDN later if needed.
Strong
I would keep file metadata in the database because it is structured and queried. I would store the actual bytes in object storage because the files are large and durable. If reads are public and repeated across regions, I would place a CDN in front of the origin.

The strong version sounds grounded because each layer has a job, and the jobs map directly to access patterns.

Key ideas

Six anchors.

Comparison

Origin and edge are not the same thing.

Object storage

The durable origin.

Holds the authoritative blob bytes. Optimized for cheap durable storage of large objects. Usually the place the CDN falls back to on a miss.

CDN

The fast temporary copy.

Keeps popular content closer to readers. Lowers latency and origin load. May evict content, expire content, or miss entirely.

Speaking script

Lines for the content-storage conversation.

Opening
I want to separate metadata from file bytes because they have different storage and access patterns.
Sketching
The database holds structured facts about the file, while object storage holds the blob itself.
Deep dive
For reads, I'd put a CDN in front of the origin if the same content is fetched repeatedly or users are spread across regions.
Trade-off
A CDN improves latency and reduces origin load, but it adds another cached layer, so I only add it when the traffic shape justifies it.
Extending
If the content is private, I may still use edge delivery, but I need authorization-aware access such as signed URLs or gated fetches.
Defending
The clean pattern is database for description, object storage for bytes, and CDN for repeated delivery.
Common mistakes

How candidates blur three different layers into one.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Files can live in the main database until the system gets large.

Better model

What to replace it with

Large objects and relational metadata have different access patterns, cost profiles, and delivery needs.

Interview move

What to do in the room

Store metadata in the application database, bytes in object storage, and popular public content near users through a CDN.

Trade-offs

Four storage shapes.

Storage choiceGood whenWeak whenInterview line
Main database only Files are tiny, rare, and the system is intentionally very simple. Blobs are large, numerous, or downloaded often. I'd only keep file bytes in the main database if they stay very small and simplicity matters more than scale.
Metadata DB + object storage Default Files are large and durable, while metadata still needs structured queries. The system barely stores blobs at all. This is my default split: structured facts in the database, large bytes in object storage.
CDN in front of object storage Default Content is read heavily, publicly, or across regions. Reads are low, content is highly private, or edge caching gives little benefit. I add the CDN when repeated global reads justify a fast edge layer in front of the origin.
No CDN yet Traffic is low or readers are concentrated in one region. Origin latency and bandwidth are becoming the bottleneck. I'd skip the CDN in v1 if traffic is modest, then add it once repeated reads start paying for the edge layer.
Mini case study

Photo sharing — one upload path, one delivery path.

Upload path

  • User sends image bytes through the app.
  • App stores the object in object storage.
  • App writes metadata to the database.

Read path

  • Readers request the image through the CDN.
  • If the edge has it, return fast.
  • If not, fetch from origin and cache the copy.

What evolves later

  • Thumbnails or resized variants become separate objects.
  • Private content needs signed URLs or authorization checks.
  • Image processing may move to an async pipeline.

Lesson

  • Query the metadata.
  • Store the bytes durably.
  • Serve repeated reads from the edge.
Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "The database stores video metadata and permissions; object storage holds the file; the CDN serves safe public or signed content."

We do

Complete the missing piece.

For short videos, draw three boxes: metadata store, object storage, and CDN. Label who reads each one.

You do

Answer without notes.

Answer the practice prompt by separating upload, metadata write, processing, and playback.

Practice

Try it before you read the model answer.

Prompt
Design a service for uploading and sharing short videos.
  • What goes in the database?
  • What goes in object storage?
  • When does a CDN become necessary?
Show a strong model answer
I'd keep structured metadata in the database, such as video ID, uploader, privacy setting, title, processing status, and the object location. I'd store the actual video bytes in object storage because the files are large and need durable blob storage. For playback, I'd put a CDN in front of the origin once reads become frequent or geographically distributed, so repeated playback requests do not always go back to the same storage origin.
Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Memory hook
Query the label. Store the box. Serve from the edge.
Recap

Three things to take into the room.

1

Metadata and blobs are different jobs.

One is queried fields. The other is durable bytes.

2

The CDN is a speed layer, not a truth layer.

Origin remains the durable source. Edge holds temporary copies.

3

Separate upload from download in your explanation.

That alone makes the design sound much more disciplined.

Reusable interview line
"I would keep file metadata in the database, store the bytes in object storage, and add a CDN when repeated or global reads justify serving cached copies from the edge."