Accepting large uploads, processing them safely, and serving playback at scale.
Which part of video upload is user-facing, and which part is background processing?
Before reading, split the design into upload, storage, processing, metadata, and playback.
Video combines object storage, queues, workers, CDN, metadata, permissions, and reliability.
Video systems look intimidating because the files are large and the pipeline has several stages. The clean way to think about them is simple: accept the upload, store the raw file durably, process it into playable formats, then serve playback efficiently.
That sequence matters. You do not want to make the user wait for full transcoding before acknowledging the upload. Upload, processing, and playback are different traffic shapes and should not be treated as one undifferentiated path.
The core design move is separating what the user is waiting for from the heavy work the system can do later.
Interviewers like video upload because it pulls together object storage, queues, asynchronous workers, metadata versus blob storage, and CDN-backed playback in one concrete system.
The stronger answer separates the user-facing step, the background pipeline, and the playback delivery path.
Video upload is mainly a large file upload problem.
Video upload is a pipeline: accept bytes, store safely, transcode asynchronously, update metadata, and serve playback with the right access rules.
Draw ingestion and playback as separate paths, then deep dive on processing or authorization.
| Video design choice | Good when | Weak when | Interview line |
|---|---|---|---|
| App server proxies the full upload | The system is small and simplicity matters more than throughput. | Upload volume or file size becomes large enough to overload the app tier. | I would proxy uploads only while scale is small; large files push me toward direct object-storage upload. |
| Direct upload to object storage Default | Uploads are large and you want the app tier out of the heavy byte path. | You need very tight inline validation before any bytes are stored. | Direct upload keeps the heavy file transfer away from the core application servers. |
| Synchronous processing | The files are tiny and immediate readiness matters more than throughput. | Transcoding is expensive and users would wait too long. | I would not make the user wait for full transcoding unless the processing is trivial. |
| Async transcoding pipeline Default | Processing is slow, bursty, or parallelizable. | The product requires instant ready-to-play output from the same request. | Async processing keeps the upload path fast while letting transcoding scale independently. |
| CDN playback | Videos are public and read-heavy across regions. | Playback is tiny, private, or too low-traffic to justify the edge layer yet. | A CDN is high leverage because playback reads dominate and media files are large. |
This is one of the most useful design choices to explain out loud. The real question is where you want the heavy byte path to go.
This is the playback pressure test. The first thing that gets hot is usually reads, not the original upload path.
Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.
I would say: "Upload success means the original file is durably stored; playback readiness can happen later after transcoding."
For private videos, mark which checks happen before upload, before metadata read, and before playback.
Answer the practice prompt with two paths: upload pipeline and authorized playback.
Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.
Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.
Change one constraint in the practice prompt and answer again in half the time.
Use the rubric to pick one dimension below 3, then retry only that dimension.
The user should not wait for transcoding.
Transcoding, thumbnails, and status updates belong in the background pipeline.
Playback is where the read-heavy scale shows up.