Everything open source
May 16–17, 2016: Training & Tutorials
May 18–19, 2016: Conference
Austin, TX

Becoming friends with Cassandra and Spark

Jon Haddad (DataStax), Dani Traphagen (GridGain)
9:00am–12:30pm Tuesday, 05/17/2016
Location: Meeting Room 18 A/B Level: Intermediate
Average rating: ***..
(3.47, 15 ratings)

Prerequisite knowledge

Attendees should be familiar with the command line and Linux and have general experience with programming and databases.

Materials or downloads needed in advance

Attendees need a laptop with a minimum of 4 GB of RAM and VirtualBox or VMware installed, as well as a GitHub account.

You can download the VirtualBox files here: https://oscon2016-friends-with-cassandra.s3.amazonaws.com/oscon2016.zip

It has everything you need on it. We will be distributing the VM during the session as well, but it's going to be fastest if you download it beforehand.


Jon Haddad and Dani Traphagen explore all the basics you’ll need to become best buds with the radically scalable, always-on, and increasingly popular Apache Cassandra database. But wait, there’s more. Jon and Dani also cover using Apache Spark for large-scale data processing. Jon and Dani get you acquainted with these technologies and give you some added resources so you can dive even deeper. You’ll be able to take home the VM on a supplied flash drive so you have access to it whenever you want to explore further. After all is said and done, you’ll walk out with some shiny new knowledge and a couple of new pals.

Topics include:

  • Understanding what makes Cassandra tick: Jon and Dani talk about everything Cassandra under the sun, giving you a background on architecture and data modeling so you have a firm foundation moving forward.
  • Learning how to communicate: Jon and Dani look at the SQL-like query language, CQL, that will get you talking to Cassandra like a champ in the provided VM.
  • Learning to work together: Jon and Dani introduce the types of use cases Cassandra and Spark DataFrames are ideal for, then get hands-on with some Project Jupyter notebooks to try it out.
Photo of Jon Haddad

Jon Haddad


Jon Haddad has 15 years’ experience in both development and operations. For the last 10, he’s worked at various startups in southern California. For the last two years, he’s been the maintainer of cqlengine, the Python object mapper for Cassandra, now integrated into the native Cassandra driver. Jon is currently a technical evangelist at Datastax, where he continues to focus on advancing Cassandra in the Python, operations, and data science communities. Jon holds a degree in computer science from the University of Vermont.

Photo of Dani Traphagen

Dani Traphagen


Dani Traphagen is a solution architect for GridGain, where she consults on high-tech caching architectures. Previously, Dani consulted at DataStax and led technical training internationally on Apache Cassandra and DataStax Enterprise. Her passion for teaching began while working in the Electrical Engineering and Computer Science department at the University of California, Berkeley, where she taught scientists technical skills, helped create a data science course, and raised awareness about the growing open science community. Dani has since volunteered with and generated training content for a number of organizations, including software carpentry, women in technology, rOpenSci, and GitHub. Earlier in her career, Dani worked in cartilage tissue engineering at the University of California, San Francisco, where her interests for heavy machinery, science, and code fused. If you don’t catch Dani behind a computer, you’ll often see her in the wild, backpacking, riding her bike, or climbing things. She also makes sure to keep the coffee business afloat in her hometown of Hermosa Beach.

Comments on this page are now closed.


Jon Haddad
05/20/2016 8:28am CDT

I’ve fixed the issue with the VM, and reposted the files here. If you want to go through the exercises, you should be able to start the VM (wait a minute for all the backround services to start please) then open the OSCON spark tutorials.

You can download the fixed VM here:


Jon Haddad
05/20/2016 6:04am CDT

Ryan, the slides are linked above.


Ryan Guo
05/20/2016 5:42am CDT

What a pity the slides view is not available for me.Would you like to send me a copy by email,it’s so appreciated.

Picture of Cory Donnelly
Cory Donnelly
05/17/2016 10:44am CDT

Amit, there’s now a slide link above. Alternatively here’s a link to all available slides: http://oscon.com/slides

Amit Lalloo
05/17/2016 8:24am CDT

can i please have the url to the presentation ?

Picture of Dani Traphagen
Dani Traphagen
05/17/2016 7:43am CDT

Hi Cory & André, I sent this over, should be posted soon! Cheers!

Picture of André Morrow
André Morrow
05/17/2016 7:33am CDT

Once we receive the presentation from the speaker we will be able to make the slides live at the top of this page.

Picture of Cory Donnelly
Cory Donnelly
05/17/2016 7:25am CDT

Dani in the conference you said you would reupload the slides — are they already online somewhere?

Picture of Dani Traphagen
Dani Traphagen
05/15/2016 4:01am CDT

Since the conference wifi gods are generally angry gods, PLEASE help us help you by downloading the VM from the link above. This will make you so happy. I promise.

Picture of Jon Haddad
Jon Haddad
05/13/2016 8:44am CDT

You can download the VirtualBox files here: https://oscon2016-friends-with-cassandra.s3.amazonaws.com/oscon2016.zip

It has everything you need on it. We will be distributing the VM during the session as well, but it’s going to be fastest if you download it beforehand.

Picture of Dani Traphagen
Dani Traphagen
05/06/2016 5:51pm CDT

Hi Bill,

We generally disseminate flash drives with the VM and post it here a week out. Fret not, this isn’t our first conference wifi rodeo. :)


Bill Harper
05/06/2016 5:41pm CDT

The prerequisites need a lot more work given how bad conference wi-fi can be. Just sayin…