Consistency or Bust - Breaking a Riak Cluster

Data: NoSQL Databases
Location: Oregon Ballroom 203
Average rating: *....
(1.17, 6 ratings)

NoSQL databases have become popular options for companies looking to augment their applications in ways that are extremely difficult to create using relational databases. With an endless number of open source NoSQL offerings, each with certain strengths and weaknesses, it is difficult to determine where to start and what traps to watch out for. This workshop will delve into one of the lesser known NoSQL databases called Riak and prove out how it overcomes one of the major stumbling blocks for running a system on commodity hardware in a distributed model.

Riak is a dynamo-inspired NoSQL database built by Basho which is currently used by Mozilla and Comcast to name a few. It scales predictably and easily and provides the ability to quickly prototype, test, and deploy applications.

While all NoSQL systems are similar from a concept standpoint, each has its own flavor and implementation. Unlike Cassandra or MongoDB for example, Riak’s claim to fame is that it is a truly fault tolerant system with no single point of failure. This is due to the fact that no machines are central or special in Riak. This puts the onus on the application team to decide the amount of fault tolerance needed for each solution.

This workshop will attempt to prove Riak’s claim and see if it can be broken or not. We will accomplish this by taking a very basic Riak cluster consisting of N nodes and perform some tests which will simulate a real-world example of how NoSQL databases are used and what data issues a failure can cause. More specifically, we will perform a large data import consisting of Twitter feed aggregations, and while the import is processing, we will shut down one or more of the nodes. When the import is completed, the running nodes will be queried to confirm that the data is synchronized, then the nodes that were removed will be brought back online. Once this happens, the cluster will be queried again to show the “eventual consistency” of the data as it spreads across all the nodes.

Throughout the workshop, we will also touch on a few scenarios where traditional relational databases like MySQL or SQL Server cannot meet the needs but that implementing a distributed NoSQL cluster can solve a number of problems.

With the newfound knowledge, the attendees will have the knowledge to build a Riak cluster and be comfortable that it will be stable and redundant.

Jeffrey Kirkell

Project Management Institute

Jeff Kirkell is a technologist, geek, overall agile fanboy, and aspiring cynic who spends his time formulating ways to take the best bits of open source and proprietary technologies to create systems that wow the business side with capabilities. Considering himself more an artist than engineer, and priding himself in having installed more data and application systems on his laptop than anyone he knows, Jeff usually in the Rich Internet Application space. Other interest include Triple-Stores, Key-Value stores, and various SemWeb goodness, in addition to destroying systems for testing purposes and forcing wide spread open source adoption.