← Articulet System Design, Made Clear Chapter 16 · Video Upload / YouTube-Lite
Part 4 · Real Interview Systems Chapter 16

The video upload.

Accepting large uploads, processing them safely, and serving playback at scale.

Learning objective
Design a video upload and playback system by separating upload acceptance, metadata storage, object storage, asynchronous processing, and CDN delivery, and explain why each stage exists.
Before you read

Make a prediction first.

Predict

Answer before the explanation.

Which part of video upload is user-facing, and which part is background processing?

Commit

Write a rough answer.

Before reading, split the design into upload, storage, processing, metadata, and playback.

Connect

Notice where it returns.

Video combines object storage, queues, workers, CDN, metadata, permissions, and reliability.

Plain English

This is really three systems sharing one product.

Video systems look intimidating because the files are large and the pipeline has several stages. The clean way to think about them is simple: accept the upload, store the raw file durably, process it into playable formats, then serve playback efficiently.

That sequence matters. You do not want to make the user wait for full transcoding before acknowledging the upload. Upload, processing, and playback are different traffic shapes and should not be treated as one undifferentiated path.

Reasonable v1 scope
  • Upload a video.
  • Store metadata.
  • Transcode into a few playback renditions.
  • Make the video playable.
Layer on later
  • Thumbnails.
  • Captions.
  • Moderation.
  • Recommendations.
  • Analytics.

The core design move is separating what the user is waiting for from the heavy work the system can do later.

Why it matters in interviews

This problem ties together storage, async work, and global delivery.

Interviewers like video upload because it pulls together object storage, queues, asynchronous workers, metadata versus blob storage, and CDN-backed playback in one concrete system.

Weak opener
User uploads the video and we store it and then stream it.
Strong opener
I want the upload path to acknowledge durable receipt quickly, then trigger asynchronous transcoding and thumbnail generation. Metadata lives in the database, raw and processed video files live in object storage, and playback should come through a CDN.

The stronger answer separates the user-facing step, the background pipeline, and the playback delivery path.

Mental model

Check in, process, then distribute.

Think of airport baggage: check in, process in the back, then distribute.
Check in upload accepted Process transcode, thumbnail, status Distribute CDN playback serve from the edge
The upload is the check-in step, transcoding is back-room processing, and playback is an edge-distribution problem.
Key ideas

Six anchors.

Core diagram

Draw upload, processing, and playback as separate lanes.

UPLOAD PATH PROCESSING PATH PLAYBACK PATH U Uploader Upload API acknowledge durable receipt Object storage raw upload blob bytes Metadata DB Upload event Transcoding workers renditions + thumbnails Processed objects mp4, hls, thumbnails V Viewer CDN edge playback Origin storage processed renditions video bytes only
The user waits for upload acceptance, not full processing. The playback path is a separate read-heavy system with its own optimization point.
Speaking script

Lines you can actually say out loud.

Opening
I want the user-facing upload path to acknowledge durable receipt quickly, not wait for full processing.
Sketching
The metadata goes in the database, while the raw video and processed renditions live in object storage.
Deep dive
Transcoding is slow and should happen asynchronously through a queue and worker pool.
Deep dive
Playback is read-heavy and globally distributed, so I would serve processed video through a CDN.
Extending
If uploads are very large, I would consider direct upload to object storage so the application tier does not proxy every byte.
Defending
The trade-off is a cleaner, faster user-facing path in exchange for more asynchronous pipeline complexity.
Common mistakes

Predictable ways this answer goes wrong.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Video upload is mainly a large file upload problem.

Better model

What to replace it with

Video upload is a pipeline: accept bytes, store safely, transcode asynchronously, update metadata, and serve playback with the right access rules.

Interview move

What to do in the room

Draw ingestion and playback as separate paths, then deep dive on processing or authorization.

Trade-offs

The decisions that come up every time.

Video design choiceGood whenWeak whenInterview line
App server proxies the full upload The system is small and simplicity matters more than throughput. Upload volume or file size becomes large enough to overload the app tier. I would proxy uploads only while scale is small; large files push me toward direct object-storage upload.
Direct upload to object storage Default Uploads are large and you want the app tier out of the heavy byte path. You need very tight inline validation before any bytes are stored. Direct upload keeps the heavy file transfer away from the core application servers.
Synchronous processing The files are tiny and immediate readiness matters more than throughput. Transcoding is expensive and users would wait too long. I would not make the user wait for full transcoding unless the processing is trivial.
Async transcoding pipeline Default Processing is slow, bursty, or parallelizable. The product requires instant ready-to-play output from the same request. Async processing keeps the upload path fast while letting transcoding scale independently.
CDN playback Videos are public and read-heavy across regions. Playback is tiny, private, or too low-traffic to justify the edge layer yet. A CDN is high leverage because playback reads dominate and media files are large.
Deep dive

Should the app tier proxy the upload, or should the client upload directly?

This is one of the most useful design choices to explain out loud. The real question is where you want the heavy byte path to go.

Proxy through app tier simpler, but the app handles every byte U App tier Object storage raw upload Direct to object storage app issues permission, storage takes the bytes U Upload API Signed upload Object storage heavy byte path
Proxying is simpler early. Direct upload usually wins once files get large enough that the application tier should stop sitting in the middle of every byte transfer.
Mini case study

A newly uploaded video becomes popular after being shared.

This is the playback pressure test. The first thing that gets hot is usually reads, not the original upload path.

What gets hot

  • Playback reads.
  • Not the original upload path.

What helps first

  • Serve processed renditions from a CDN.
  • Keep object storage as durable origin.
  • Keep metadata reads separate from video byte delivery.

What should not happen

  • Every viewer pulls video directly from origin without edge caching.
  • App servers sit in the middle of every playback request.

The lesson

  • At scale, playback often becomes the dominant path.
Demo conversation

How a strong exchange sounds.

Interviewer
What are the main lanes in this system?
Candidate
Upload, processing, and playback. They have very different latency needs, so I do not want one path to carry the burden of all three jobs.
Interviewer
Would you proxy all video uploads through your app servers?
Candidate
Not at larger scale. I would keep metadata control in the app layer, but let the heavy video bytes go directly to object storage so the app tier does not become the bandwidth bottleneck.
Interviewer
What do you optimize first for viewers?
Candidate
Playback readiness and delivery efficiency. That means processed renditions, segmented delivery, and CDN distribution matter much earlier than fancy upload-side features.
Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "Upload success means the original file is durably stored; playback readiness can happen later after transcoding."

We do

Complete the missing piece.

For private videos, mark which checks happen before upload, before metadata read, and before playback.

You do

Answer without notes.

Answer the practice prompt with two paths: upload pipeline and authorized playback.

Practice

Try it before you read the model answer.

Prompt
Design a video system that supports private videos visible only to authorized users.
  • What stays the same?
  • What changes on the playback path?
  • What trade-off appears?
Show a strong model answer
The upload, metadata, object storage, and asynchronous transcoding pipeline stay mostly the same. The main change is on playback: I now need authorization before serving the media, which may mean signed URLs, short-lived access tokens, or another controlled origin access pattern in front of the CDN. The trade-off is stronger access control versus a slightly more complex playback flow and edge-caching strategy.
Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Memory hook
Check in, process, then distribute.
Recap

Three things to take into the room.

1

Acknowledge early.

The user should not wait for transcoding.

2

Process later.

Transcoding, thumbnails, and status updates belong in the background pipeline.

3

Serve from the edge.

Playback is where the read-heavy scale shows up.

Reusable interview line
"I would separate upload acceptance, background processing, and playback delivery: acknowledge durable receipt early, transcode asynchronously, and serve processed renditions through a CDN."