Java Community News - What is Consistent Hashing and Why You Should Care

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Community News
What is Consistent Hashing and Why You Should Care

7 replies on 1 page. Most recent reply: Dec 12, 2007 2:31 AM by jurgen goelen

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 7 replies on 1 page

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

What is Consistent Hashing and Why You Should Care

Posted: Nov 28, 2007 3:05 PM

Summary
Caching has become the primary way to scale data-intensive enterprise applications. When objects are distributed across caches, classes should define consistent hash functions, writes Tom White in a recent article explaining the benefits of consistent hashing.

Most distributed caching frameworks are based on some form of mapping of an object to a cache node. Such mappings are performed by a hash function. To understanding how to implement efficient caching, it is important to grasp the benefits of consistent hashing, argues Tom White in a recent article, Consistent Hashing.

While such hashing algorithms are likely implemented as part of a client of a caching library, such as memcached, developers need to still understanding how to write consistent hashing for their objects, writes White:

Consistent hashing is needed to avoid swamping your servers... The need for consistent hashing arose from limitations experienced while running collections of caching machines—web caches, for example. If you have a collection of n cache machines then a common way of load balancing across them is to put object o in cache machine number hash(o) mod n.

This works well until you add or remove cache machines (for whatever reason), for then n changes and every object is hashed to a new location. This can be catastrophic since the originating content servers are swamped with requests from the cache machines. It's as if the cache suddenly disappeared. Which it has, in a sense.

Consistent hashing is used to avoid that sort of situation, and to aid in the efficient distribution of objects to caches:

Consistently maps objects to the same cache machine, as far as is possible, at least... The hash function actually maps objects and caches to a number range. This should be familiar to every Java programmer - the hashCode method on Object returns an int, in order for consistent hashing to be effective it is important to have a hash function that mixes well. Most implementations of Object's hashCode do not mix well - for example, they typically produce a restricted number of small integer values... MD5 hashes are recommended here.

In the article, White discusses a popular algorithm and its Java implementation to provide consistent hashing for objects.

What sort of guidelines do you follow when defining the hashCode() methods of your classes?

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005