Skip to main content

An Introduction to Real-Time Analytics with Cassandra and Hadoop

Patricia Gorla (The Last Pickle)
Hadoop & Beyond Regent Parlor
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Average rating: ***..
(3.00, 12 ratings)
Slides:   1-PDF 

Cassandra is a distributed storage system for managing lots of structured data over many commodity servers, while providing a highly-available service with no single point of failure.

Put another way, Cassandra is a solution to scaling out relational databases to the terabyte scale.

Cassandra’s append-only structure makes it a perfect HDFS alternative to perform large scale mapreduce analytics on real-time data.

In this session, we will cover:

  • Introduction to Cassandra
  • Setting up a Cassandra Cluster with MapReduce
  • Pros and Cons of using Cassandra
  • Failure Mitigation: How to recover lost nodes
  • Performance Tuning

At the end of the tutorial, participants will have set up a multi-node Cassandra/Hadoop cluster, indexed data into the cluster at high volumes, and run analyses against the cluster.

Photo of Patricia Gorla

Patricia Gorla

The Last Pickle

Patricia is a software consultant with OpenSource Connections. Starting with Python application development, Patricia moved to data analysis after becoming fascinated with machine learning.

From there, she has worked on many full-stack data projects: gathering and scrubbing the data, running analyses, and developing custom visualizations to lay out the information.

She is passionate about information retrieval, and loves tackling the challenges companies face with fast-growing datasets.

Comments on this page are now closed.

Comments

Rob Long
10/28/2013 5:58am EDT

The wifi is not holding up well. You might consider making the downloads available asap.

Picture of Patricia Gorla
Patricia Gorla
10/27/2013 7:09pm EDT

Also, downloads for the labs will be posted during the class.

Picture of Patricia Gorla
Patricia Gorla
10/27/2013 7:09pm EDT

Bill, you will need to download Datastax Enterprise from Datasax, and you will need a *NIX laptop.

If on a Windows machine, any VM or remote server with Sun Java 6 can be used.

Bill Bejeck
10/27/2013 7:00pm EDT

Are there any required downloads/VM images for this session?

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts