Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Owning Time Series With Team Apache: Cassandra, Spark, Spark Streaming, and Kafka

Patrick McFadin (Datastax), Helena Edelson (Apple)
9:00am–12:30pm Wednesday, 10/15/2014
Hadoop & Beyond
Location: 1 E05
Average rating: **...
(2.80, 5 ratings)

Break out your laptops for this hands-on tutorial is geared around understanding the basics of how Apache Cassandra stores and access time series data. We’ll start with an overview of how Cassandra works and how that can be a perfect fit for time series. Then we will add in Apache Spark as a perfect analytics companion. There will be coding as a part of the hands on tutorial. The goal will be to take a example application and code through the different aspects of working with this unique data pattern. The final section will cover the building of an end-to-end data pipeline to ingest, process and store high speed, time series data.

Hour 1: Core Concepts

  • Introduction to Apache Cassandra
  • Why Cassandra is used for storing time series data
  • Data models for time series
  • How Spark and Cassandra work so well together

Hour 2: Key Foundational Skills

  • Using Apache Cassandra
  • Creating the right development environment
  • Basic integration with Apache Spark and Cassandra

Hour 3: Integrating An End-To-End Data Pipeline

  • Technologies used: Spark, Spark Streaming, Cassandra, Kafka, Akka, Scala
  • By ingesting time series data into Kafka we will first leverage Spark Streaming to store the raw data in Cassandra so that it can be replayed at any time and reused in multiple ways
  • Then we will apply Spark Streaming transformations and aggregation to streaming data, and store material views in Cassandra.
Photo of Patrick McFadin

Patrick McFadin

Datastax

Patrick McFadin is regarded as a foremost expert for Apache Cassandra and data modeling. As Chief Evangelist for Apache Cassandra and consultant working for DataStax, he has been involved in some of the biggest deployments in the world.

Photo of Helena Edelson

Helena Edelson

Apple

Committer to several open source projects including the Spark Cassandra Connector, Cassandra Kafka Connector, a previous contributor to Akka (2 new features in Akka Cluster), Spring Integration and several others. She is also a speaker at international Big Data and Scala conferences: Kafka Summit, Spark Summit (EU and NYC), Strata (NYC and San Jose), Reactive Summit, QCon SF, Scala Days (EU and US), Scala World and Philly Emerging Technology. Currently a Senior Software Engineer in Distributed Systems at Apple.

Comments on this page are now closed.

Comments

Picture of Varun Sharma
Varun Sharma
11/04/2014 12:13am EST

Can we get session video or presentation of this talk?
Will be really helpful if so. sharma.varun@flipkart.com

Thanks

Picture of Don Chang
Don Chang
11/02/2014 4:46am EDT

can you share your presentation materials to me donchang@hanmail.net ?
Thank you in advance.

David Zeng
10/14/2014 9:01am EDT

Do we need to download any files in advance? We are warned that there will not be enough bandwidth for download.

Rao Kasinadhuni
10/03/2014 10:45am EDT

What does it mean if the session shows as “SOLD OUT”. Can’s we attend?

Thanks!