Part 2 · Core Building Blocks Chapter 7

Storage systems and CDNs.

Where big files live, and how they get closer to users.

Learning objective

Distinguish clearly between metadata storage, object storage, and CDN delivery so you can explain large-file systems without mixing up queryable records, durable blobs, and fast global reads.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

Why are big files usually not stored in the same place as ordinary rows?

Commit

Write a rough answer.

Before reading, split the system into metadata, bytes, and delivery path.

Connect

Notice where it returns.

This split returns in video upload, profile photos, feeds with media, and private content delivery.

Concrete first

Storage is not one thing.

Candidates say "store the file" as if that is one decision. It usually isn't. The facts about a file and the file bytes themselves are different jobs.

In a photo or video product, the system needs to remember things like user ID, upload time, privacy setting, processing status, and object location. Those are structured records. It also needs to hold the image or video bytes, which can be large, durable, and downloaded repeatedly. Those are blobs. They usually should not live in the same place.

Mental model

Label here. Box there. Shelf near the user.

Describe the file with a label, store the bytes in the box, and keep hot copies on a shelf near the user.

The sentence to remember is structural, not vendor-specific: query the label, store the box, serve hot copies from the shelf.

First principles

Different storage jobs come from different access patterns.

Metadata is small, structured, and often queried by fields like owner, time, or status.
Blob data is large, durable, and usually fetched as a whole object, not filtered with relational queries.
Global repeated reads benefit from copies near the user, which is the job of the CDN.
The database answers questions about the file. Object storage keeps the file. The CDN accelerates reads of copies.
Uploads and downloads are different paths. One design decision rarely serves both equally well.

Core diagram

One clean content system: metadata in one lane, bytes in another, hot reads at the edge.

Uploads write durable origin data and metadata. Reads should hit the edge first and fall back to origin only when needed.

Why it matters in interviews

Interviewers are listening for separation of responsibility.

Weak

We can store images in the database and add a CDN later if needed.

Strong

I would keep file metadata in the database because it is structured and queried. I would store the actual bytes in object storage because the files are large and durable. If reads are public and repeated across regions, I would place a CDN in front of the origin.

The strong version sounds grounded because each layer has a job, and the jobs map directly to access patterns.

Key ideas

Six anchors.

Metadata and file bytes usually have different storage needs.
Object storage is durable origin storage, not a query engine.
A CDN serves copies near users. It is not the source of truth.
Read-heavy global content benefits the most from the CDN layer.
Private or rarely accessed content may not justify a CDN immediately.
Always talk separately about upload flow and download flow.

Comparison

Origin and edge are not the same thing.

Object storage

The durable origin.

Holds the authoritative blob bytes. Optimized for cheap durable storage of large objects. Usually the place the CDN falls back to on a miss.

CDN

The fast temporary copy.

Keeps popular content closer to readers. Lowers latency and origin load. May evict content, expire content, or miss entirely.

Speaking script

Lines for the content-storage conversation.

Opening

I want to separate metadata from file bytes because they have different storage and access patterns.

Sketching

The database holds structured facts about the file, while object storage holds the blob itself.

Deep dive

For reads, I'd put a CDN in front of the origin if the same content is fetched repeatedly or users are spread across regions.

Trade-off

A CDN improves latency and reduces origin load, but it adds another cached layer, so I only add it when the traffic shape justifies it.

Extending

If the content is private, I may still use edge delivery, but I need authorization-aware access such as signed URLs or gated fetches.

Defending

The clean pattern is database for description, object storage for bytes, and CDN for repeated delivery.

Common mistakes

How candidates blur three different layers into one.

Storing large media files in the main relational database without a strong reason.
Using the word "storage" without saying whether they mean metadata, blob bytes, or edge copies.
Saying "use S3" or "use a CDN" without naming its exact role.
Forgetting that the CDN serves copies, not the durable original.
Adding a CDN automatically even when the system is small, private, or not geographically distributed.
Ignoring that upload and download paths have different bottlenecks.
Forgetting that processing steps like thumbnails or transcoding often happen asynchronously after upload.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Files can live in the main database until the system gets large.

Better model

What to replace it with

Large objects and relational metadata have different access patterns, cost profiles, and delivery needs.

Interview move

What to do in the room

Store metadata in the application database, bytes in object storage, and popular public content near users through a CDN.

Trade-offs

Four storage shapes.

Storage choice	Good when	Weak when	Interview line
Main database only	Files are tiny, rare, and the system is intentionally very simple.	Blobs are large, numerous, or downloaded often.	I'd only keep file bytes in the main database if they stay very small and simplicity matters more than scale.
Metadata DB + object storage Default	Files are large and durable, while metadata still needs structured queries.	The system barely stores blobs at all.	This is my default split: structured facts in the database, large bytes in object storage.
CDN in front of object storage Default	Content is read heavily, publicly, or across regions.	Reads are low, content is highly private, or edge caching gives little benefit.	I add the CDN when repeated global reads justify a fast edge layer in front of the origin.
No CDN yet	Traffic is low or readers are concentrated in one region.	Origin latency and bandwidth are becoming the bottleneck.	I'd skip the CDN in v1 if traffic is modest, then add it once repeated reads start paying for the edge layer.

Mini case study

Photo sharing — one upload path, one delivery path.

Upload path

User sends image bytes through the app.
App stores the object in object storage.
App writes metadata to the database.

Read path

Readers request the image through the CDN.
If the edge has it, return fast.
If not, fetch from origin and cache the copy.

What evolves later

Thumbnails or resized variants become separate objects.
Private content needs signed URLs or authorization checks.
Image processing may move to an async pipeline.

Lesson

Query the metadata.
Store the bytes durably.
Serve repeated reads from the edge.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "The database stores video metadata and permissions; object storage holds the file; the CDN serves safe public or signed content."

We do

Complete the missing piece.

For short videos, draw three boxes: metadata store, object storage, and CDN. Label who reads each one.

You do

Answer without notes.

Answer the practice prompt by separating upload, metadata write, processing, and playback.

Practice

Try it before you read the model answer.

Prompt

Design a service for uploading and sharing short videos.

What goes in the database?
What goes in object storage?
When does a CDN become necessary?

Show a strong model answer

I'd keep structured metadata in the database, such as video ID, uploader, privacy setting, title, processing status, and the object location. I'd store the actual video bytes in object storage because the files are large and need durable blob storage. For playback, I'd put a CDN in front of the origin once reads become frequent or geographically distributed, so repeated playback requests do not always go back to the same storage origin.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Metadata and blobs are different jobs.

One is queried fields. The other is durable bytes.

The CDN is a speed layer, not a truth layer.

Origin remains the durable source. Edge holds temporary copies.

Separate upload from download in your explanation.

That alone makes the design sound much more disciplined.

Reusable interview line

"I would keep file metadata in the database, store the bytes in object storage, and add a CDN when repeated or global reads justify serving cached copies from the edge."