The Artima Developer Community
Sponsored Link

Java Buzz Forum
Using Virtual Nodes to Compact Vector Clocks

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Brian McCallister

Posts: 1282
Nickname: frums
Registered: Sep, 2003

Brian McCallister is JustaProgrammer who thinks too much.
Using Virtual Nodes to Compact Vector Clocks Posted: Jul 29, 2008 9:20 PM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Brian McCallister.
Original Post: Using Virtual Nodes to Compact Vector Clocks
Feed Title: Waste of Time
Feed URL: http://kasparov.skife.org/blog/index.rss
Feed Description: A simple waste of time and weblog experiment
Latest Java Buzz Posts
Latest Java Buzz Posts by Brian McCallister
Latest Posts From Waste of Time

Advertisement

One hiccup encountered when using vector clocks is that there is no inherent way of reducing the size of the clock. Basically, any node which acts as a causal agent of change has the potential to be forever recorded in the clock. This leads to unbounded clock size, with time. Most systems tend to have a limited number of causal, or lead, nodes providing clock values so it is avoided, but sometimes you don't have that.

When vector clocks are used to track causality in a storage system, such as in Amazon's Dynamo system, it becomes possible to create syncronization points in the history of the element, between storage nodes, if the storage nodes are able to form consensus between themselves on the value of an element at a specific point in the elements history. If we are talking an eventually consistent system, this can be done by using a background syncronization and merge algorithm which merges acausal changes in the background. Alternately, it could be client resolved, in systems like Dynamo, but that isn't my problem, so... I digress.

When the system believes it has a value at a given clock value, where the clock is causally related to the unified value on the other storage nodes holding the element, it can try to achieve concensus about this, and if successful, increment an artifical clock key which we'll call the epoch. If successful, the epoch value subsumes the vector clock values associated with the epoch in the element, shrinking the element's clock.

To run through an example, let's say we have a system which uses three storage nodes for each element. We don't care exactly how these elements values are assigned, except to recognize that it allows for non-causally related changes to occur. At a given point in time the storage nodes may have values for an element A, as follows:

NodeClock
red[red:2, green:1]
blue[red:2, green:1, blue:2]
green[red:3, green:1, blue:1]

A Paxos instance may be executed proposing that epoch:1 be [red:2, green:1]. As each node can agree that [red:2, green:1] comes before its value, it can accept the epoch value. Upon acceptance of the value, the clocks would become:

NodeClock
red[epoch:1]
blue[epoch:1, blue:2]
green[epoch:1, red:3, blue:1]

Assuming a background reconciliation protocol, a system could apply an appropriate heuristic to decide when to atempt to increment the epoch. A good example of such would be after unrelated values have been successfully merged. When it makes sense, and how to back-off to older clock values really depends on the characteristics of the system being designed and how it will be used.

As pointed out in the Dynamo paper, systems where there tend to be a small number of keys in the clock don't generally have this problem, Dynamo avoids it by causing the clock keys to be based on a small number of likely coordinator nodes:

To this end, Dynamo employs the following clock truncation scheme: Along with each (node, counter) pair, Dynamo stores a timestamp that indicates the last time the node updated the data item. When the number of (node, counter) pairs in the vector clock reaches a threshold (say 10), the oldest pair is removed from the clock. Clearly, this truncation scheme can lead to inefficiencies in reconciliation as the descendant relationships cannot be derived accurately. However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated.

In something like that, it may not make a lot of sense -- the problem just doesn't tend to come up. On the other hand, other systems, such as one which uses a user id, or session id, as a clock key would tend to generate larger clocks. This kind of keying can be useful for providing read-what-I-wrote consistency, but that is another discussion :-)

Read: Using Virtual Nodes to Compact Vector Clocks

Topic: OpenSSO Express on IdentityBuzz from PeopleOverProcess.com Previous Topic   Next Topic Topic: Session Clustering with Hazelcast Webapp Manager

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use