Part 4 · Real Interview Systems Chapter 16

The video upload.

Accepting large uploads, processing them safely, and serving playback at scale.

Learning objective

Design a video upload and playback system by separating upload acceptance, metadata storage, object storage, asynchronous processing, and CDN delivery, and explain why each stage exists.

Before you read

Make a prediction first.

Predict

Answer before the explanation.

Which part of video upload is user-facing, and which part is background processing?

Commit

Write a rough answer.

Before reading, split the design into upload, storage, processing, metadata, and playback.

Connect

Notice where it returns.

Video combines object storage, queues, workers, CDN, metadata, permissions, and reliability.

Plain English

This is really three systems sharing one product.

Video systems look intimidating because the files are large and the pipeline has several stages. The clean way to think about them is simple: accept the upload, store the raw file durably, process it into playable formats, then serve playback efficiently.

That sequence matters. You do not want to make the user wait for full transcoding before acknowledging the upload. Upload, processing, and playback are different traffic shapes and should not be treated as one undifferentiated path.

Reasonable v1 scope

Upload a video.
Store metadata.
Transcode into a few playback renditions.
Make the video playable.

Layer on later

Thumbnails.
Captions.
Moderation.
Recommendations.
Analytics.

The core design move is separating what the user is waiting for from the heavy work the system can do later.

Why it matters in interviews

This problem ties together storage, async work, and global delivery.

Interviewers like video upload because it pulls together object storage, queues, asynchronous workers, metadata versus blob storage, and CDN-backed playback in one concrete system.

Weak opener

User uploads the video and we store it and then stream it.

Strong opener

I want the upload path to acknowledge durable receipt quickly, then trigger asynchronous transcoding and thumbnail generation. Metadata lives in the database, raw and processed video files live in object storage, and playback should come through a CDN.

The stronger answer separates the user-facing step, the background pipeline, and the playback delivery path.

Key ideas

Six anchors.

Upload acceptance should usually be fast and separate from heavy processing.
Metadata belongs in a database; video files belong in object storage.
Transcoding is asynchronous work and should scale independently.
Playback is read-heavy and usually belongs behind a CDN.
Raw upload, processed renditions, and thumbnails may all be separate stored objects.
Direct-to-storage upload often keeps the application tier out of the heaviest byte path.

Speaking script

Lines you can actually say out loud.

Opening

I want the user-facing upload path to acknowledge durable receipt quickly, not wait for full processing.

Sketching

The metadata goes in the database, while the raw video and processed renditions live in object storage.

Deep dive

Transcoding is slow and should happen asynchronously through a queue and worker pool.

Deep dive

Playback is read-heavy and globally distributed, so I would serve processed video through a CDN.

Extending

If uploads are very large, I would consider direct upload to object storage so the application tier does not proxy every byte.

Defending

The trade-off is a cleaner, faster user-facing path in exchange for more asynchronous pipeline complexity.

Common mistakes

Predictable ways this answer goes wrong.

Blocking the upload response on full transcoding.
Storing large video blobs in the main relational database.
Mixing metadata queries with raw media delivery.
Forgetting that playback and upload are different traffic shapes.
Ignoring thumbnails, renditions, or processing status.
Putting all bytes through the application tier when object storage can handle the bulk data path more directly.
Treating CDN delivery as optional even when playback is public and repeated heavily.

Misconception check

Correct the wrong model before it sticks.

Wrong intuition

What feels tempting

Video upload is mainly a large file upload problem.

Better model

What to replace it with

Video upload is a pipeline: accept bytes, store safely, transcode asynchronously, update metadata, and serve playback with the right access rules.

Interview move

What to do in the room

Draw ingestion and playback as separate paths, then deep dive on processing or authorization.

Trade-offs

The decisions that come up every time.

Video design choice	Good when	Weak when	Interview line
App server proxies the full upload	The system is small and simplicity matters more than throughput.	Upload volume or file size becomes large enough to overload the app tier.	I would proxy uploads only while scale is small; large files push me toward direct object-storage upload.
Direct upload to object storage Default	Uploads are large and you want the app tier out of the heavy byte path.	You need very tight inline validation before any bytes are stored.	Direct upload keeps the heavy file transfer away from the core application servers.
Synchronous processing	The files are tiny and immediate readiness matters more than throughput.	Transcoding is expensive and users would wait too long.	I would not make the user wait for full transcoding unless the processing is trivial.
Async transcoding pipeline Default	Processing is slow, bursty, or parallelizable.	The product requires instant ready-to-play output from the same request.	Async processing keeps the upload path fast while letting transcoding scale independently.
CDN playback	Videos are public and read-heavy across regions.	Playback is tiny, private, or too low-traffic to justify the edge layer yet.	A CDN is high leverage because playback reads dominate and media files are large.

Deep dive

Should the app tier proxy the upload, or should the client upload directly?

This is one of the most useful design choices to explain out loud. The real question is where you want the heavy byte path to go.

Proxying is simpler early. Direct upload usually wins once files get large enough that the application tier should stop sitting in the middle of every byte transfer.

Mini case study

A newly uploaded video becomes popular after being shared.

This is the playback pressure test. The first thing that gets hot is usually reads, not the original upload path.

What gets hot

Playback reads.
Not the original upload path.

What helps first

Serve processed renditions from a CDN.
Keep object storage as durable origin.
Keep metadata reads separate from video byte delivery.

What should not happen

Every viewer pulls video directly from origin without edge caching.
App servers sit in the middle of every playback request.

The lesson

At scale, playback often becomes the dominant path.

Demo conversation

How a strong exchange sounds.

Interviewer

What are the main lanes in this system?

Candidate

Upload, processing, and playback. They have very different latency needs, so I do not want one path to carry the burden of all three jobs.

Interviewer

Would you proxy all video uploads through your app servers?

Candidate

Not at larger scale. I would keep metadata control in the app layer, but let the heavy video bytes go directly to object storage so the app tier does not become the bandwidth bottleneck.

Interviewer

What do you optimize first for viewers?

Candidate

Playback readiness and delivery efficiency. That means processed renditions, segmented delivery, and CDN distribution matter much earlier than fancy upload-side features.

Worked example to solo answer

Fade the support before the real practice.

Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.

I do

Study the model move.

I would say: "Upload success means the original file is durably stored; playback readiness can happen later after transcoding."

We do

Complete the missing piece.

For private videos, mark which checks happen before upload, before metadata read, and before playback.

You do

Answer without notes.

Answer the practice prompt with two paths: upload pipeline and authorized playback.

Practice

Try it before you read the model answer.

Prompt

Design a video system that supports private videos visible only to authorized users.

What stays the same?
What changes on the playback path?
What trade-off appears?

Show a strong model answer

The upload, metadata, object storage, and asynchronous transcoding pipeline stay mostly the same. The main change is on playback: I now need authorization before serving the media, which may mean signed URLs, short-lived access tokens, or another controlled origin access pattern in front of the CDN. The trade-off is stronger access control versus a slightly more complex playback flow and edge-caching strategy.

Training loop

Make this chapter stick.

Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.

Recall

Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.

Vary

Change one constraint in the practice prompt and answer again in half the time.

Score

Use the rubric to pick one dimension below 3, then retry only that dimension.

Recap

Three things to take into the room.

Acknowledge early.

The user should not wait for transcoding.

Process later.

Transcoding, thumbnails, and status updates belong in the background pipeline.

Serve from the edge.

Playback is where the read-heavy scale shows up.

Reusable interview line

"I would separate upload acceptance, background processing, and playback delivery: acknowledge durable receipt early, transcode asynchronously, and serve processed renditions through a CDN."

The video upload.

Make a prediction first.

Answer before the explanation.

Write a rough answer.

Notice where it returns.

This is really three systems sharing one product.

This problem ties together storage, async work, and global delivery.

Check in, process, then distribute.

Six anchors.

Draw upload, processing, and playback as separate lanes.

Lines you can actually say out loud.

Predictable ways this answer goes wrong.

Correct the wrong model before it sticks.

What feels tempting

What to replace it with

What to do in the room

The decisions that come up every time.

Should the app tier proxy the upload, or should the client upload directly?

A newly uploaded video becomes popular after being shared.

What gets hot

What helps first

What should not happen

The lesson

How a strong exchange sounds.

Fade the support before the real practice.

Study the model move.

Complete the missing piece.

Answer without notes.

Try it before you read the model answer.

Make this chapter stick.

Recall

Vary

Score

Three things to take into the room.

Acknowledge early.

Process later.

Serve from the edge.