Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

An Introduction to time series with Team Apache

Patrick McFadin (DataStax)
13:30–17:00 Wednesday, 1/06/2016
IoT & real-time
Location: Capital Suite 14 Level: Intermediate
Average rating: ****.
(4.09, 11 ratings)

Materials or downloads needed in advance

Attendees need a laptop with Java and Scala installed and a GitHub account. In preparation for the tutorial, there are a couple of prerequisites to complete.

Download and install DataStax Enterprise, which includes Apache Cassandra and Apache Spark. We’ll be using it to learn more about how each works. If you're running Windows, you’ll need to run it in a Linux VM locally. Make sure the IP address of the guest is available to the host.

Check out the KillrWeather project on GitHub. You can follow the README instructions to get it running.

Description

We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.

Patrick walks you through organizing a stream of data into an efficient queue using Apache Kafka, processing the data in flight using Apache Spark Streaming, storing the data in a highly scaling and fault-tolerant database using Apache Cassandra, and transforming and finding insights in volumes of stored data using Apache Spark.

Topics include:

  • Understanding the right use case
  • Considerations when deploying Apache Kafka
  • Processing streams with Apache Spark Streaming
  • A deep dive into how Apache Cassandra stores data
  • Integration between Cassandra and Spark
  • Data models for time series
  • Postprocessing without ETL using Apache Spark on Cassandra
Photo of Patrick McFadin

Patrick McFadin

DataStax

Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.

Comments on this page are now closed.

Comments

Picture of Anahita Saghafi Saghafi
Anahita Saghafi Saghafi
1/06/2016 16:46 BST

Is there a URL to download the slides please?

Dawid Benski
28/05/2016 13:01 BST

Do I have to download a full Datastax enterprise or the sandbox (e.g. Virtualbox) is good enough?I’m not familiar with stb so the next question – is the KillrWeather also running directly from Linux or it must be running from Windows? I want to avoid installing lots of stuff on my personal computer and keep everything together in the virtual machine.
Thanks!