CAP theorem states that it is impossible for a distributed system to simultaneously provide all three guarantees of
- Consistency: All nodes see the same data at the same time.
- Availability: a guarantee that every request receives a response about whether it was successful or failed.
Availability in CAP is defined as “every request received by a non-failing [database] node in the system must result in a [non-error] response”
- Partition tolerance: the system continues to operate despite arbitrary message loss or failure of part of the system.
A system is partition tolerant if processing can continue in both partitions in the case of a network failure
In above article, Robert Greiner, neatly explained the CAP with diagrams.
“Given that networks aren’t completely reliable, you must tolerate partitions in a distributed system, period. Fortunately, though, you get to choose what to do when a partition does occur. According to the CAP theorem, this means we are left with two options: Consistency and Availability.”
“CP – Consistency/Partition Tolerance – Wait for a response from the partitioned node which could result in a timeout error. The system can also choose to return an error, depending on the scenario you desire. Choose Consistency over Availability when your business requirements dictate atomic reads and writes. ”
Some distributed systems prefer CP and in case of partitions, cluster with quorum continue to operate. other part of the cluster is either dormant/non-operational/shut down. All the IOs are redirected to the cluster with quorum.
But there can be more than one partition in the cluster and based on the quorum policies (>50% nodes), quorum can not be established. So any IO on distributed system mail fail or it will be put into read-only mode. Availability is sacrificed in this case.
“AP – Availability/Partition Tolerance – Return the most recent version of the data you have, which could be stale. This system state will also accept writes that can be processed later when the partition is resolved. Choose Availability over Consistency when your business requirements allow for some flexibility around when the data in the system synchronizes. Availability is also a compelling option when the system needs to continue to function in spite of external errors (shopping carts, etc.)
I would highly recommend to read this IEEE article on “Consistency Tradoffs in Modern Distributed Database System Design”
PS: 1. This is my understanding. The information provided here may not be accurate or immature. Please comment if you find any misleading information.
2. All the words in the ” ” are not my wordings. They were part of the respective websites mentioned above.