Riak: A decentralized key-value store

  • It looks like a nice distributed datastore. I can't tell how fast it is (my guess is 'pretty slow right now'), but it has all the right scaling properties.

    It works like a hashtable where the keys are distributed onto nodes based on their hash values. Each node takes a subset of the keyspace, and this subset can be dynamically reconfigured (so you can add nodes later and not have to move everything). You can also replicate each key onto several different nodes for fault-tolerance.

    It doesn't attempt to address transactions; instead, when people make branching updates, it keeps all the branches. (Think how Git works -- it deals with fast-forward automatically.) You have to merge them yourself -- when you do a get and there are multiple branches, you get all the heads.

    It's based on Erlang and has a pluggable backend storage system, so you don't have to deal with ETS if you don't want to. (Hooray)

    It has a builtin mapreduce framework. The docs suggest that it does as much of the work as possible for a given set of keys on the node which contains those keys, minimizing transfer costs. That's a very nice property.

    I'm sort of excited about this project. I still want a non-relational distributed database that can do fast range queries over arbitrary properties -- I hate to be iterating over millions of items when I just want the latest 10. Give me the tools to define those indices over my data, and I'll be a happy man. (CouchDB comes the closest for me so far...)

  • Man so many to choose from: mongodb, riak, and hbase. Anyone know of comparisons?