Distributed Theory

One of the most important theories in distributed systems is the CAP theory. This theory was proposed by computer scientist Eric Allen Brewer from the University of California, Berkeley, in 2000. It was later proven by researchers from MIT in 2002, turning the hypothesis into a theorem.

The CAP theorem proposes three aspects for distributed systems:

  • Consistency: The data among multiple replicas remains consistent.
  • Availability: Each request can receive a normal, error-free response, but there’s no guarantee that the data is the most recent.
  • Partition Tolerance: Even if a node in the system fails, the system can still provide consistency and availability of services externally.

https://island-hexo.oss-cn-beijing.aliyuncs.com/202403241745676.png

For instance, consider a current system with two databases, DB0 and DB1.

Consistency means that when data changes (i.e., after a write operation), the data retrieved (i.e., read operation) is the same regardless of who accesses it or where it’s accessed from.

For example, when user 1 modifies data in DB0 through a write operation, the data retrieved by either user 1 or user 2, from either DB0 or DB1, should be exactly the same. This is what’s referred to as consistency.

In other words, if data in DB0 is modified, mechanisms should inform DB1 of the modification to ensure that the data remains consistent across different databases.

When a user sends a relevant request, both DB0 and DB1 will return the corresponding data, but it’s not necessary to ensure that the data is consistent.

If DB0 and DB1 encounter issues, such as network problems or other hardware issues, leading to a failure in communication between them, DB0 and DB1 become two partitions. Despite this communication failure, DB0 and DB1 can still provide services externally.

However, such situations are unavoidable in practical systems, making partition tolerance a necessary condition.

Since the CAP theorem’s three rules cannot be simultaneously satisfied, three scenarios arise, as depicted in the figure above: meeting any two rules results in CA, CP, and AP architectures. However, partition tolerance is mandatory, leaving us with CP and AP relationships.

Common CP software includes Zookeeper, which sacrifices availability for consistency. Zookeeper ensures consistent results can be obtained at any time, but it doesn’t guarantee that every service request can be fulfilled.

In AP architecture, data consistency isn’t as critical, allowing different services to return different data.

The CAP theory isn’t perfect and faces many challenges. For example, to maintain data consistency between DB0 and DB1, communication is required, which takes time. This can lead to situations where data is out of sync at certain times, such as in master-slave machines with significant master-slave delays, resulting in inconsistent data retrieval by users.

The CAP theory isn’t entirely a matter of choosing three out of two (or one of two), for instance, if the probability of partition tolerance occurrence is very low, sacrificing A and C isn’t necessary.

The BASE theory is considered an extension of the CAP theory, providing a balance between consistency and availability. In CAP, consistency refers to data remaining consistent at all times, known as strong consistency. As mentioned earlier, ensuring consistent data at all times is challenging. The BASE theory supplements this problem. Since achieving strong consistency is difficult, ensuring eventual consistency based on the system’s own business characteristics is also acceptable.

The BASE theory was proposed by eBay engineer Dan Pritchett.

BASE stands for “Basically Available, Soft State, and Eventually Consistent.”

Basically available means that the system can still be used even in the event of a failure, although there may be some problems compared to normal systems, such as response time and sacrificing some functionalities.

Soft state refers to the system allowing for intermediate states in data, such as the intermediate states during database master-slave synchronization.

Eventually consistent emphasizes that after going through the aforementioned soft states, the data eventually achieves consistency.

The BASE theory proposes sacrificing strong consistency of systems to ensure availability, allowing the system’s data to be inconsistent for a period but requiring eventual consistency.

Understanding CAP Theory of Distributed Systems in One Article

Distributed Architecture: CAP Theory/AP Architecture/CP Architecture