Riak 2.0: More like Cassandra

Riak 2.0 was released on September 2. Even though Martin Fowler held up Riak as the achetypal key/value store in his seminal 2012 book NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, the new features in Riak 2.0 push it even closer to Cassandra, even though Cassandra is a columnar store, a completely different type of NoSQL than key/value.

Whereas previously, especially before the 1.4 version of Riak, where Riak treated data opaquely (you store whatever you want into the value as a BLOB -- Riak paid no deep attention to it), Riak 2.0 introduces new first-class atomic data types: sets and maps.

Riak 2.0 also introduces a continuum of options in the CAP theorem tradeoff between consistency and availability. In Riak 2.0, full consistency is possible. Cassandra has provided such tunable consistency for at least three years.

What Riak offers that Cassandra does not is vector clocks, an algorithm invented in 1988 that allows distributed data stores to reason about causality of data changes based on incrementing "logical clocks". Cassandra, in contrast, resolves conflicts based on system clocks, which of course are subject to skew in the real world. Datastax says it's moved beyond vector clocks, saying that they only detect conflict and don't resolve it.

There are other new Riak 2.0 features such as plug-in security and integration with Apache Solr for full-text search.

On the other hand, Cassandra has the SQL-like CQL, which is possible due to Cassandra being a columnar store rather than a key/value store. And Cassandra has Datastax-supported Spark integration.

Basho touts as Riak's strong points being predictable performance and its reliability, having been implemented in Erlang, the language adopted by Ericsson to achieve nine 9s of reliability.

With Riak 2.0, Riak is no longer a mere opaque key/value store with eventual consistency. Riak 2.0 brings atomic data of complex types (sets and maps) and tunable consistency. And although it has map/reduce built-in (which can be written in either Erlang or Javascript), it still does not have the convenience of a SQL-like language like CQL.