Where big files live, and how they get closer to users.
Why are big files usually not stored in the same place as ordinary rows?
Before reading, split the system into metadata, bytes, and delivery path.
This split returns in video upload, profile photos, feeds with media, and private content delivery.
Candidates say "store the file" as if that is one decision. It usually isn't. The facts about a file and the file bytes themselves are different jobs.
In a photo or video product, the system needs to remember things like user ID, upload time, privacy setting, processing status, and object location. Those are structured records. It also needs to hold the image or video bytes, which can be large, durable, and downloaded repeatedly. Those are blobs. They usually should not live in the same place.
The strong version sounds grounded because each layer has a job, and the jobs map directly to access patterns.
Holds the authoritative blob bytes. Optimized for cheap durable storage of large objects. Usually the place the CDN falls back to on a miss.
Keeps popular content closer to readers. Lowers latency and origin load. May evict content, expire content, or miss entirely.
Files can live in the main database until the system gets large.
Large objects and relational metadata have different access patterns, cost profiles, and delivery needs.
Store metadata in the application database, bytes in object storage, and popular public content near users through a CDN.
| Storage choice | Good when | Weak when | Interview line |
|---|---|---|---|
| Main database only | Files are tiny, rare, and the system is intentionally very simple. | Blobs are large, numerous, or downloaded often. | I'd only keep file bytes in the main database if they stay very small and simplicity matters more than scale. |
| Metadata DB + object storage Default | Files are large and durable, while metadata still needs structured queries. | The system barely stores blobs at all. | This is my default split: structured facts in the database, large bytes in object storage. |
| CDN in front of object storage Default | Content is read heavily, publicly, or across regions. | Reads are low, content is highly private, or edge caching gives little benefit. | I add the CDN when repeated global reads justify a fast edge layer in front of the origin. |
| No CDN yet | Traffic is low or readers are concentrated in one region. | Origin latency and bandwidth are becoming the bottleneck. | I'd skip the CDN in v1 if traffic is modest, then add it once repeated reads start paying for the edge layer. |
Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.
I would say: "The database stores video metadata and permissions; object storage holds the file; the CDN serves safe public or signed content."
For short videos, draw three boxes: metadata store, object storage, and CDN. Label who reads each one.
Answer the practice prompt by separating upload, metadata write, processing, and playback.
Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.
Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.
Change one constraint in the practice prompt and answer again in half the time.
Use the rubric to pick one dimension below 3, then retry only that dimension.
One is queried fields. The other is durable bytes.
Origin remains the durable source. Edge holds temporary copies.
That alone makes the design sound much more disciplined.