Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

An introduction to time series with Team Apache (Half Day)

Patrick McFadin (DataStax)
1:30pm–5:00pm Tuesday, 03/29/2016
IoT and Real-time

Location: LL21 A
Tags: real-time
Average rating: ****.
(4.07, 14 ratings)

Prerequisite knowledge

*BEFORE* you arrive onsite, please do the following:

First: Download and install DataStax Enterprise here , which includes Apache Cassandra and Apache Spark. We’ll be using it to learn more about how each works. If you are running Windows, you’ll need to run it in a linux VM locally. Make sure the IP address of the guest is available to host.

Second: Check out the KillrWeather project from Github here . You can follow the README instructions to get it running.

Attendees should be familiar with concepts around data engineering. This tutorial uses Java and Python.

Materials or downloads needed in advance

Download the "prerequisite tools":https://academy.datastax.com/strata-download-datastax-enterprise before attending.

Description

We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored—while users expect an always-on experience—leaving little room for error. Patrick McFadin gives an overview of the powerful Team Apache: Apache Kafka, Spark, and Cassandra. Attendees will learn how to organize a stream of data into an efficient queue using Apache Kafka, process the data in flight using Apache Spark Streaming, store the data in a highly scaling and fault-tolerant database using Apache Cassandra, and transform and find insights in volumes of stored data using Apache Spark.

Topics include:

  • Understanding the right use case
  • Considerations when deploying Apache Kafka
  • Processing streams with Apache Spark Streaming
  • A deep dive into how Apache Cassandra stores data
  • Integration between Cassandra and Spark
  • Data models for time series
  • Post processing without ETL using Apache Spark on Cassandra
Photo of Patrick McFadin

Patrick McFadin

DataStax

Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.

Comments on this page are now closed.

Comments

Elpida Tzortzatos
04/05/2016 1:16am PDT

Can you share the slides from this session?

Jeremy Oldfather
03/29/2016 1:26am PDT

Should we install DSE as a Cassandra, Search, or Analytics node? The installed for OSX makes me choose.