Amazon.com CTO Werner Vogels discusses in a recent article a common pattern of scalable enterprise systems: the eventual consistency of data.
Eric Brewer, a Berkeley computer scientist and former head of Inktomi, noted an important rule of distributed data management systems: that of the following three properties, only two can be achieved at a given time:
System availability, and
Tolerance of network partitioning.
Werner Vogels, CTO of Amazon.com, penned a thorough essay on how these three properties can be managed in a practical distributed system, in Eventually Consistent.
Vogels notes that a simple model where all consumers and producers of data see the exact same information, has been impractical since the 1970s. More recently, the availability of data has become an important requirement of Web-based systems, even more so than exact and constant data consistency.
Vogel examines the various ways of looking at data consistency in a distributed system, noting that:
There are two ways of looking at consistency. One is from the developer / client point of view; how they observe data updates. The second way is from the server side; how updates flow through the system and what guarantees systems can give with respect to updates.
In the process of examining the practical aspects of managing lots of data in a distributed, highly available manner, Vogels defines several terms, such as eventual consistency:
The storage system guarantees that if no new updates are made to the object eventually (after the inconsistency window closes) all accesses will return the last updated value. The most popular system that implements eventual consistency is DNS, the domain name system. Updates to a name are distributed according to a configured pattern and in combination with time controlled caches, eventually of client will see the update.
Noting that eventual consistency has become an important practical fact not only of highly distributed systems, but also of enterprise databases, Vogel writes that,
Eventually consistency is not some esoteric property of extreme distributed systems. Many modern RDBMS systems that provide primary-backup reliability implement their replication techniques in both synchronous and asynchronous modes. In synchronous mode the replica update is part of the transaction, in asynchronous mode the updates arrive at the backup in a delayed manner, often through log shipping. In the last mode if the primary fails before the logs are shipped, reading from the promoted backup will produce old, inconsistent values. Also to support better scalable read performance RDBMS systems have start to provide reading from the backup, which is a classical case of providing eventual consistency guarantees, where the inconsistency windows depends on the periodicity of the log shipping.
What do you think of Vogel's notions of eventual consistency of data?What level of data consistency is required of the systems you're working on?