Sending the right message, through the right channel, without losing user trust.
What can go wrong if a notification system simply calls email, SMS, and push APIs directly?
Before reading, name one user preference, one retry risk, and one duplicate risk.
Notifications combine events, preferences, queues, provider failures, idempotency, and user trust.
Notification systems look small from the outside: something happens, then a user gets notified. In reality, the system has to decide which events matter, who should receive them, which channel should be used, what the user preferences allow, and what should happen if a provider is down.
A clean first version is enough to expose the real design: receive an event, decide recipients, check preferences, create channel-specific jobs, then let channel workers send through providers. That is already much better than treating the problem like one direct email API call.
The key simplification is separating intent from delivery. Once you do that, the rest of the architecture gets much clearer.
Interviewers like notification systems because they reveal whether the candidate distinguishes event intake from delivery, respects user preferences, isolates channels with queues, handles provider failures, and retries safely without duplicates.
The stronger answer separates intent, policy, and delivery. That makes the system sound deliberate instead of ad hoc.
Notifications are just API calls to external providers.
Notifications are an orchestration pipeline with preferences, channel choice, retries, provider failures, and duplicate control.
Model event intake, preference resolution, channel queues, provider adapters, and delivery state.
| Notification choice | Good when | Weak when | Interview line |
|---|---|---|---|
| Single direct send path | The system is tiny and there is one simple channel. | Multiple channels, retries, and provider failures matter. | A direct send path is fine only for the smallest first version. |
| One queue for all notifications | Traffic is low and channel behavior is similar. | Channels have very different retry patterns, priorities, or providers. | One queue is simpler, but separate channel queues give better isolation and control. |
| Per-channel queues and workers Default | Channels fail differently and should scale or retry independently. | The system is so small the extra queues add unnecessary overhead. | Per-channel queues help me isolate failures and tune retries independently. |
| Immediate send for every event | The event is transactional and user-facing timing matters. | Many notifications could be digested, batched, or deprioritized. | I would send urgent transactional notifications immediately, but batch or digest lower-priority ones. |
| Retry with dedupe Default | Provider failures are transient and eventual delivery matters. | Retries are unsafe and duplicate protection is weak. | Retries improve delivery, but only if I can prevent duplicates cleanly. |
This is the reliability question in the chapter. Temporary provider failure is common. Duplicate user-visible sends are what destroy trust.
This is the clean routing test. One product event may produce several channel-specific attempts, each with its own failure behavior.
Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.
I would say: "The order event creates a notification job, preferences choose channels, and each channel has its own retry policy."
For ride sharing, separate urgent trip notifications from marketing or summary messages.
Answer the practice prompt with one provider failure and one duplicate-prevention strategy.
Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.
Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.
Change one constraint in the practice prompt and answer again in half the time.
Use the rubric to pick one dimension below 3, then retry only that dimension.
Do not jump straight to provider calls.
Different channels fail differently and deserve independent control.
Reliability without dedupe becomes spam.