July 20–24, 2015
Portland, OR

Data modeling Cassandra using CQL3

Mike Biglan (Twenty Ideas), Elijah Hamovitz (Analytic Spot)
5:00pm–5:40pm Wednesday, 07/22/2015
Data D137/138
Average rating: ****.
(4.00, 3 ratings)
Slides:   1-BIN    2-PDF 

Prerequisite Knowledge

Basic understanding of databases

Description

Usually data modeling is independent of the query language used to implement that data model. When CQL3 was introduced, however, it added a relational-database-centric abstraction that hides many key details of the underlying storage. What’s worse, many data modeling articles reference the deprecated Thrift interface, making it difficult to transfer their wisdom into CQL3. Though CQL can be an efficient and convenient tool to use when querying, knowing how CQL actually maps to Cassandra’s storage structure is key to being able to create scalable and flexible data models.

As needs continue to increase for highly-scalable applications (e.g. Internet of Things, Big Data Collect Everything), Cassandra provides an excellent solution when high-concurrency writes, no-master, near-linear-scalability, and user-chosen consistency are a must. In the broad family of NoSQL data stores, Cassandra is distinct from the document-store model of MongoDB.

Knowing data modeling for relational databases or for document stores does not transfer well to Cassandra; some anti-patterns of those systems are core patterns in Cassandra. Data modeling in relational databases is primarily about the normalization of the data being stored. With Cassandra, data modeling must also take into account the primary access patterns and, in many cases, denormalize certain areas.

Topics we will cover include:

  • Composite keys in CQL3 versus how they are stored
  • Cassandra Collections of set, list, and map. Spoiler: each element is stored as a separate column
  • What queries are allowed based on composite keys and secondary indexes
  • How to enable common patterns of other typically desired queries (e.g. inverted indexes)
  • Using clustering keys
  • Partition key choices and avoiding hotspots.
Photo of Mike Biglan

Mike Biglan

Twenty Ideas

Mike Biglan, an avid technologist, currently heads two Oregon-based startups: Analytic Spot (analyticspot.com), a SaaS service for collecting and visualizing educational analytics for mobile Apps, and Twenty Ideas (twentyideas.com), a software consulting, architecture, and development agency. Before that, Mike was the lead technologist of two highly-successful companies: Silicon Valley startup Happy Bits, maker of the AnyVideo/Evercam/Joya line of iOS and Android Apps (e.g. http://bit.ly/18yaQVa); and Concentric Sky (concentricsky.com), where he oversaw a quadrupling of the size of the company, achieved high employee retention, and provided a strong and collaborative culture. With a background in software architecture and development, computer science, bioinformatics, statistics, education, psychology, and economics, Mike returned to Eugene in 2005 after living in Chicago (B.A. in economics at the University of Chicago), Washington DC, and San Diego (M.S. in Computer Science and Engineering at UCSD). He has also served on several boards such as the Promise Neighborhoods Research Consortium (promiseneighborhoods.org) and the City of Eugene’s budget committee and also has spoken at OSCON and DjangoCon.

Photo of Elijah Hamovitz

Elijah Hamovitz

Analytic Spot

Elijah Hamovitz has been developing web applications since 2010. He specializes in building Python and JavaScript applications, both client- and server-side, and has had the opportunity to familiarize himself with a huge variety of frameworks, platforms, and paradigms.

Comments on this page are now closed.

Comments

Picture of Mike Biglan
Mike Biglan
07/30/2015 9:29am PDT

Sure thing, added a PDF

Suyash Gandhi
07/30/2015 2:59am PDT

Can you please provide PowerPoint compatible slide deck?

Michael Aliotti
07/25/2015 2:51am PDT

The one suggestion for improvement I have is for the speakers to make sure they repeat ALL questions from the audience when they are not asked through the microphone. This was done some of the time, but not consistently for every question. Other than that, it was great!