Delivering messages quickly enough, in a way users can trust.
What must a chat system promise besides “message appears quickly”?
Before reading, list one delivery guarantee, one ordering rule, and one offline behavior.
Chat combines realtime connection management, message storage, ordering, fan-out, and recovery.
To the user, chat feels simple: type a message, hit send, see it appear. Underneath, the sender might be online or offline, the receiver might be online or offline, the message has to be stored durably, and delivery still has to make sense inside each conversation.
A clean first version is one-to-one text chat with stored history, live delivery when possible, and push notification when the receiver is offline. That scope is enough to surface the real design questions without drowning in group fan-out, media, or presence.
The most important simplification is this: you usually care about order within a conversation, not global order across the entire system.
Interviewers like chat because it reveals whether the candidate can reason about persistent connections, durable storage, per-conversation ordering, live delivery, store-and-forward behavior, and the difference between sending a message and notifying someone about it.
The stronger answer explains user-visible behavior, durable acceptance, and delivery fallback in one pass.
Chat is solved once you add WebSockets.
WebSockets are only the connection. The system also needs message durability, ordering, delivery state, offline handling, and group fan-out.
Describe the message lifecycle: send, persist, fan out, deliver, acknowledge, and recover.
| Chat design choice | Good when | Weak when | Interview line |
|---|---|---|---|
| Polling for new messages | Scale is small and realtime expectations are modest. | Users expect fast interactive delivery. | Polling is the simplest start, but persistent connections usually fit realtime chat better. |
| Persistent connection delivery Default | Low-latency message delivery matters. | Connection management complexity is not justified for a very small system. | A persistent connection gives me fast delivery without repeated polling overhead. |
| Per-conversation ordering Default | Users need messages to make sense within each chat. | The design is forced into unnecessary global coordination. | Conversation-level ordering is the guarantee I actually need here. |
| Acknowledge only after durable write | Message loss is unacceptable after the UI says sent. | The system is over-optimized for latency at the cost of user trust. | I would not treat the message as accepted until it is durably stored. |
| Push notifications for offline users | Users may be disconnected but still need awareness of new messages. | Push delivery is treated as the primary transport rather than a fallback. | Push is my offline alert path, not my main message channel. |
Many weak answers blur these together. A cleaner answer separates three questions: was the message accepted, was it delivered live, and was the user notified?
This is the clean store-and-forward test. If the design only works when both users are connected, it is incomplete.
Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.
I would say: "I will store the message before fan-out so reconnecting clients can recover missed messages."
For large groups, compare direct fan-out with topic or stream-based delivery.
Answer the practice prompt using the message lifecycle, not only the websocket box.
Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.
Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.
Change one constraint in the practice prompt and answer again in half the time.
Use the rubric to pick one dimension below 3, then retry only that dimension.
Durable message flow is the real backbone.
Global ordering usually adds cost without improving the product.
Store-and-forward behavior is part of the main design.