Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Conquer the time series data pipeline with SMACK

Patrick McFadin (DataStax)
9:00am–12:30pm Tuesday, 09/27/2016
IoT & real-time
Location: 1 E 06 Level: Intermediate
Tags: real-time
Average rating: *****
(5.00, 1 rating)

Prerequisite knowledge

  • Experience programming in Java or Scala
  • A general understanding of databases
  • Familiarity with the Linux command line
  • Materials or downloads needed in advance

  • A laptop that can run 64-bit virtual machines (with virtualization extensions turned on in the BIOS)
  • A running copy of DataStax Enterprise (Cassandra + Spark running together).

    Please follow the instructions to install.

    • First, download DataStax Enterprise 5. Follow the instructions for the download. If you prefer to use a VM for isolation, create a small (2G memory, 1–2 CPU) Linux VM with CentOS, Red Hat, or Ubuntu. Do not download the Sandbox as it won't allow you to run the Scala application.
    • Second, you will get access to the DataStax Academy Slack channel. If you have any questions, you can hop into the #strata channel. I'll be in there if you have questions. Just alert me with @patrick.
  • What you'll learn

  • Understand how to evaluate time series problems with Apache Spark, Cassandra, or Kafka
  • Description

    We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, while users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day with powerful data pipelines built with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka. You’ll learn how to organize a stream of data into an efficient queue using Apache Kafka; process the data in flight using Apache Spark Streaming and Akka; store the data in a highly scaling and fault-tolerant database using Apache Cassandra; transform and find insights in volumes of stored data using Apache Spark; and keep these resources working together with Mesos.

    Topics include:

    • Understanding the right use case
    • Considerations when deploying Apache Kafka
    • Processing streams with Apache Spark Streaming
    • Deep dive into how Apache Cassandra stores data
    • Integration between Cassandra and Spark
    • Data models for time series
    • Postprocessing without ETL using Apache Spark on Cassandra
    • Understanding how Mesos can make everything work together efficiently
    Photo of Patrick McFadin

    Patrick McFadin

    DataStax

    Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.

    Comments on this page are now closed.

    Comments

    Picture of Patrick McFadin
    Patrick McFadin
    09/25/2016 1:01pm EDT

    Hi everyone!

    For Tuesday’s tutorial session you’ll need a running copy of DataStax Enterprise (Cassandra + Spark running together)

    Please follow this link: https://academy.datastax.com/strata-download-datastax-enterprise

    This is a special page I’ve setup for you to accomplish a couple things. First, you will need to download DataStax Enterprise 5. Follow the instructions for the download. If you prefer to use a VM for isolation, create a small (2G memory, 1-2 CPU) Linux VM with CentOS, RedHat or Ubuntu. Do not download the Sandbox as it won’t allow you to run our Scala application.

    Second, you will get access to our DataStax Academy Slack. If you have any questions, you can hop into the #strata channel. I’ll be in there if you have questions. Just alert me with @patrick

    See you Tuesday morning!

    Patrick

    09/25/2016 6:10am EDT

    Where do we download the vm?