The Artima Developer Community
Sponsored Link

Java Community News
What is Consistent Hashing and Why You Should Care

7 replies on 1 page. Most recent reply: Dec 12, 2007 2:31 AM by jurgen goelen

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 7 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

What is Consistent Hashing and Why You Should Care Posted: Nov 28, 2007 3:05 PM
Reply to this message Reply
Summary
Caching has become the primary way to scale data-intensive enterprise applications. When objects are distributed across caches, classes should define consistent hash functions, writes Tom White in a recent article explaining the benefits of consistent hashing.
Advertisement

Most distributed caching frameworks are based on some form of mapping of an object to a cache node. Such mappings are performed by a hash function. To understanding how to implement efficient caching, it is important to grasp the benefits of consistent hashing, argues Tom White in a recent article, Consistent Hashing.

While such hashing algorithms are likely implemented as part of a client of a caching library, such as memcached, developers need to still understanding how to write consistent hashing for their objects, writes White:

Consistent hashing is needed to avoid swamping your servers... The need for consistent hashing arose from limitations experienced while running collections of caching machines—web caches, for example. If you have a collection of n cache machines then a common way of load balancing across them is to put object o in cache machine number hash(o) mod n.

This works well until you add or remove cache machines (for whatever reason), for then n changes and every object is hashed to a new location. This can be catastrophic since the originating content servers are swamped with requests from the cache machines. It's as if the cache suddenly disappeared. Which it has, in a sense.

Consistent hashing is used to avoid that sort of situation, and to aid in the efficient distribution of objects to caches:

Consistently maps objects to the same cache machine, as far as is possible, at least... The hash function actually maps objects and caches to a number range. This should be familiar to every Java programmer - the hashCode method on Object returns an int, in order for consistent hashing to be effective it is important to have a hash function that mixes well. Most implementations of Object's hashCode do not mix well - for example, they typically produce a restricted number of small integer values... MD5 hashes are recommended here.

In the article, White discusses a popular algorithm and its Java implementation to provide consistent hashing for objects.

What sort of guidelines do you follow when defining the hashCode() methods of your classes?


James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: What is Consistent Hashing and Why You Should Care Posted: Nov 29, 2007 11:13 AM
Reply to this message Reply
I've long followed the approach that IIRC was recommended by Josh Bloch in "Effective Java. That is, when multiple fields are part of the identity you use a an arbitrary distinct and 'reasonably large' prime number to multiply each field's hash (after the first field) and sum the results. For example:
class Point
{
   int x;
   int y;
 
   public int hashCode()
   {
      return x + (57 * y);
   }
}


The idea being that (1,2) and (2,1) will now produce different hashes and because the we used a prime number, there are less combinations that collide.

From the article, I think this creates hashes that "mix well". Is that correct?

Raoul Duke

Posts: 127
Nickname: raoulduke
Registered: Apr, 2006

Re: What is Consistent Hashing and Why You Should Care Posted: Nov 29, 2007 2:55 PM
Reply to this message Reply
on the face of it, i think this article is confusing because - as far as my little brain can tell - the "mod n" step (for e.g. web caching) has very little to do with the "hash(o)" step (for any call to Object.hashCode(), be it for web caching or just toString() or logging). fixing your hash(o) isn't going to do anything if you don't fix your "mod n". which one is the main point under discussion?

Ian Robertson

Posts: 68
Nickname: ianr
Registered: Apr, 2007

Re: What is Consistent Hashing and Why You Should Care Posted: Nov 29, 2007 7:40 PM
Reply to this message Reply
> I've long followed the approach that IIRC was recommended
> by Josh Bloch in "Effective Java. That is, when multiple
> fields are part of the identity you use a an arbitrary
> distinct and 'reasonably large' prime number to multiply
> each field's hash (after the first field) and sum the
> results.

This works well for creating a good distribution of objects into a hashed collection, but there is a danger with it as well. If any of the values of the object which contribute to the hash code change once the object is in the collection, that object will get "lost" - it will still be there, but in a different bucket than it should, so the collection will not be able to find it.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: What is Consistent Hashing and Why You Should Care Posted: Nov 30, 2007 6:21 AM
Reply to this message Reply
> This works well for creating a good distribution of
> objects into a hashed collection, but there is a danger
> with it as well. If any of the values of the object which
> contribute to the hash code change once the object is in
> the collection, that object will get "lost" - it will
> still be there, but in a different bucket than it should,
> so the collection will not be able to find it.

Yes, I am aware of this problem. It's only true if you use mutable values in the hash. It seemed to me from the article that this would be really hard to deal with with regard to distributed hashing so I assumed that the objects would need to have static identities.

I suppose you could use only an immutable subset of the values used in equals in your hash assuming there are any. Another solution I've used in the past is to make the collection an observer on the object and move it to the correct bucket automatically when the hashcode changes. Perhaps something similar could be done here.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: What is Consistent Hashing and Why You Should Care Posted: Nov 30, 2007 6:23 AM
Reply to this message Reply
> on the face of it, i think this article is confusing
> because - as far as my little brain can tell - the "mod n"
> step (for e.g. web caching) has very little to do with the
> "hash(o)" step (for any call to Object.hashCode(), be it
> for web caching or just toString() or logging). fixing
> your hash(o) isn't going to do anything if you don't fix
> your "mod n". which one is the main point under discussion?

I think the main point is how the objects are assigned. The side point about hashcodes is, from what I can tell, made because the 'circular' approach to distribution will perform worse than a mod approach for poorly distributed hashes.

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Consistent Hashing versus Dynamic Partitioning Posted: Dec 1, 2007 6:17 PM
Reply to this message Reply
Consistent hashing, while a step forward from the hashing performed by Memcached (etc.), is not the state of the art. Dynamic partitioning provides all of the benefits of consistent hashing, yet allows for a smooth (lossless) transition from one cluster view (specific set of members) to another, with no loss of data.

Peace,

Cameron Purdy | Oracle
http://www.oracle.com/technology/products/coherence/index.html

jurgen goelen

Posts: 1
Nickname: jgoelen
Registered: Dec, 2007

Re: Consistent Hashing versus Dynamic Partitioning Posted: Dec 12, 2007 2:31 AM
Reply to this message Reply
Do you know some good papers that compare "consistent hashing" and "dynamic partitioning"?

Flat View: This topic has 7 replies on 1 page
Topic: Adobe Releases Open-Source BlazeDS, Updates Flex and AIR Previous Topic   Next Topic Topic: Alex Ruiz on the Dangers of Mock Objects

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use