Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra

Patrick McFadin (DataStax)
1:30pm–5:00pm Tuesday, 09/29/2015
IoT & Real-time
Location: 3D 04/09 Level: Advanced
Average rating: ****.
(4.53, 15 ratings)

Materials or downloads needed in advance

Mac or Linux Laptop. Linux VM on Windows Java and Scala installed


We as an industry are collecting more data every year. IOT, web, and mobile applications send torrents of bits at our data centers that have to be processed and stored. In addition, users expect an always-on experience, with little room for error. Numerous successful companies are doing this every day, and I can show you how.

In this tutorial session, we will cover the powerful Team Apache: Apache Kafka, Spark, and Cassandra. You’ll learn how to organize a stream of data into an efficient queue using Apache Kafka. Process the data in flight using Apache Spark Streaming. Store the data in a highly scaling and fault-tolerant database using Apache Cassandra. Transform and find insights in volumes of stored data using Apache Spark. Topics we will discuss:

  • Understanding the right use case
  • Considerations when deploying Apache Kafka
  • Processing streams with Apache Spark Streaming
  • Deep dive into how Apache Cassandra stores data
  • Integration between Cassandra and Spark
  • Data models for Time Series
  • Post processing without ETL using Apache Spark on Cassandra

There is a lot to cover in three hours so get ready and bring your laptop.

Photo of Patrick McFadin

Patrick McFadin


Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.

Comments on this page are now closed.


Picture of Patrick McFadin
Patrick McFadin
09/25/2015 5:09pm EDT

Hi everyone! In preparation for our tutorial next week, I have a couple of pre-requisites to complete.

First: Download and install DataStax Enterprise here , which includes Apache Cassandra and Apache Spark. We’ll be using it to learn more about how each works. If you are running Windows, you’ll need to run it in a linux VM locally. Make sure the IP address of the guest is available to host.

Second: Check out the KillrWeather project from Github here . You can follow the README instructions to get it running.

See everyone there!