What the system promises when things are changing, breaking, or recovering.
Where would eventual consistency be harmless, and where would it violate user trust?
Before reading, write the invariant that must remain true even during failure.
This chapter returns in payments, chat ordering, notifications, feeds, counters, and private video access.
Candidates often say consistency, reliability, and resilience as if they all mean "good system." They do not. They answer different questions.
The clean way to explain them is simple. Consistency asks how quickly the system agrees on changed truth. Reliability asks whether the intended action completes correctly. Resilience asks what happens when something fails and whether the system keeps going or recovers cleanly.
The strong answer ties the guarantee to the user experience, the failure mode, and the system behavior.
High availability and eventual consistency are always the mature distributed-system answer.
The right promise depends on the domain. Some states can lag; some must be correct before the user is told success happened.
Name the user-visible promise, the failure mode, and the recovery path.
| Guarantee choice | Good when | Weak when | Interview line |
|---|---|---|---|
| Stronger consistency | Users must see correct up-to-date state before the action is considered complete. | Slight staleness is acceptable and the extra coordination hurts latency too much. | I would pay for stronger consistency only where the product promise truly needs it. |
| Eventual consistency Default | Slight delay between write and global visibility is acceptable. | The user expects immediate exact state everywhere. | Eventual consistency is fine here because a short delay in propagation does not break the user experience. |
| Retry for reliability Default | Failures are transient and the operation can be retried safely. | Retries create harmful duplicates and the operation is not idempotent. | Retries help reliability, but I need idempotency so repeated processing does not corrupt state. |
| Failover and redundancy | Service continuity matters during node or zone failures. | The system is small enough that added redundancy is not yet justified. | Redundancy improves resilience, but I only want the level of failover the product actually requires. |
| Degraded mode | Partial service is better than total outage during dependency failure. | The feature must be all-or-nothing to remain correct. | If one noncritical dependency fails, I would prefer degraded service over a full outage. |
Do not jump straight from reading to a full answer. First see the shape, then complete part of it, then answer alone.
I would say: "For payment confirmation, I prefer a slower clear answer over a fast misleading success."
For the payment prompt, separate confirmation, receipt email, and analytics into different consistency needs.
Choose one strong guarantee and one eventual side effect, then defend both.
Before moving on, turn recognition into production. Close the model answer, answer from memory, then retry one small slice.
Say the chapter's core idea without looking. Then name one related idea from an earlier chapter.
Change one constraint in the practice prompt and answer again in half the time.
Use the rubric to pick one dimension below 3, then retry only that dimension.
What should the user see and when?
Latency, coordination, redundancy, complexity, or all of them.
Do not stop at the happy path.