The hash ring
LightCloud uses consistent hashing to create a hash ring. Consistent hashing is used by most key-value stores to make it possible to add and remove nodes without invalidating too many keys.
How consistent hashing works
Consistent hashing is fairly simple (and smart way of distributing keys). It can be best explained by the idea that you have a ring that goes from 0 to some big number. Given a node A, you find a placement for A on the ring by running hash_function(A), the hash_function should generally mix the values well - good candidates for the hash function are MD5 or SHA1. Given a (key, value) pair you find the key's placement on the ring by running hash_function(key). A node holds all the keys that have a value lower than itself, but greater than the preceding node.
Tom White has written a great blog post about consistent hashing, take a look at it, it explains the idea in much greater detail.
The specialties of LightCloud's hash ring
One of the big differences between LightCloud and other key-value databases is that it does not copy keys around to other nodes. Amazon's Dynamo (that's used to power Amazon S3) uses following replication approach:
A problem with this approach is that data is "eventually" consistent and one needs to build complex structures to route efficiently (stuff like routing tables, membership management etc.). We wanted to simply this and looked at ways to reuse TokyoTyrant's master-master replication. The solution we chose was to replicate nodes via master-master replication (instead of replicating keys on the ring):
A big problem with this approach is that one will do a lot of lookups if new nodes are added to the system. I.e. routing becomes an issue. To solve this without routing tables we introduced two hash rings - one for lookups and another one for storage nodes:
Nodes of LightCloud
Each LightCloud node is a TokyoTyrant server. These nodes are managed by LightCloud manager and a sample node is run in following way:
command = "ttserver -host %(host)s -port %(port)s "\ "%(master_cfg)s "\ "-pid %(data)s/%(name)s.pid "\ "-dmn "\ "-le -ulog %(data)s/logs/%(id)s/ -sid %(id)s "\ "-rts %(data)s/data/%(id)s.rts "\ "-ulim 128m "\ "-ext %(cur_pwd)s/extensions/our.lua "\ "-thnum 6 -tout 5 "\ "%(data)s/data/%(id)s.tch%(opts)s"
Communication with TokyoTyrant
LightCloud can communicate with TokyoTyrant in following ways:
- Using Tyrant's binary protocol
- Using memcached's protocol
LightCloud uses pytyrant as default (since it's binary and supports more features such as calling Lua extensions). But memcached can be used as well.