Consistent Hashing – part 1

This post is divided into two parts. This is the Part 1 and Part 2 is coming soon!!!

Consistent Hashing:-

Consistent Hashing is a algorithm which can be used to minimise the operation cost of rehashing/hashing in a hash based implementation where scalability and availability is really important.

Consistent hashing is a special kind of hashing such that when a hash table is resized, only {K/n} keys need to be remapped on average, where { K} is the number of keys, and {n} is the number of slots/buckets. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots/bucket is defined by a modular operation.


In traditional hash tables, we find the position of object o based on the hash function mod current no. of buckets bucket_no = hash( o) mod n. This will work perfectly until we remove/add bucket. For more understanding lets take an example of Caching system.

Cache system Example:-

Suppose, initially we have three machines in our cache cluster. (A, B, C) and we have stored 6 values(Name as hash key) in our cache system. After applying the hash function here is the state of the system.

Node A: Key1,key4

Node B: Key2,key5

Node C: key3,key6

What if a node D is added now. After re-hashing state might look like:

Node A: Key2,key4

Node B: Key3

Node C: key1

Node D: key6

If you see closely most of the keys are moved to different node. So, Most of the keys will be cache miss if tried to get accessed. This should not happen only few of the keys should be moved in both the cases of addition/removal of Nodes. Consistent Hashing algorithm can be used to solve this problem.


Consistent Hashing is a algorithm in which same hash is generated for the keys and it is independent of the number of the nodes. So, keys hash will be constant and each node is assigned a group of hash values.


Let’s look at this in more detail. The hash function actually maps keys and caches to a number range. This should be familiar to every Java programmer – the hashCode method on Object returns an int, which lies in the range -231 to 231-1. Imagine mapping this range into a circle so the values wrap around. Here’s a picture of the circle with a number of objects (1, 2, 3, 4) and caches (A, B, C) marked at the points that they hash to (based on a diagram from Web Caching with Consistent Hashing by David Karger et al):


To find which cache an object goes in, we move clockwise round the circle until we find a cache point. So in the diagram above, we see object 1 and 4 belong in cache A, object 2 belongs in cache B and object 3 belongs in cache C. Consider what happens if cache C is removed: object 3 now belongs in cache A, and all the other object mappings are unchanged. If then another cache D is added in the position marked it will take objects 3 and 4, leaving only object 1 belonging to A.


This works well, except the size of the intervals assigned to each cache is pretty hit and miss. Since it is essentially random it is possible to have a very non-uniform distribution of objects between caches.

Examples of use: 

  1. Couchbase
  2. Partitioning component of Amazon’s storage system Dynamo
  3. Openstack‘s Object Storage Service Swift
  4. Data partitioning in Apache Cassandra
  5. Data Partitioning in Voldemort
  6. Akka‘s consistent hashing router
  7. Riak, a distributed key-value database
  8. GlusterFS, a network-attached storage file system
  9. Skylable, an open-source distributed object-storage system
  10. Akamai Content Delivery Network
  11. Discord chat application
  12. Maglev: A Fast and Reliable Software Network Load Balancer