Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Training provided by Databricks

Apache Spark Advanced Training

Sameer Farooqui (Databricks), Denny Lee (Concur Technologies), Christopher Fregly (Flux Capacitor)
9:00am - 5:00pm Tuesday, 02/17/2015
9:00am - 5:00pm Wednesday, 02/18/2015
Location: 211 C
9:00am - 5:00pm Thursday, 02/19/2015
Location: 211 B

This three-day curriculum features advanced lectures and hands-on technical exercises for advanced Spark usage in data exploration, analysis, and building Big Data applications. Course materials emphasize architectural design patterns and best practices for leveraging Spark in the context of other popular, complementary frameworks for building and managing Enterprise data workflows. Those who attend the training will have opportunities during the tutorial to meet and discuss with key members of the Spark development community, including Q&A sessions and whiteboarding for specific questions about use cases. Participants will also receive limited free-tier accounts on Databricks Cloud.

Topics include:

  • Using cloud-based notebooks to develop Enterprise data workflows
  • Spark integration with Cassandra, Kafka, Elasticsearch
  • Advanced use cases with Spark SQL and Spark Streaming
  • Operationalizing Spark on DataStax, Cloudera, MapR, etc.
  • Monitoring and evaluating performance metrics
  • Estimating cluster resource requirements
  • Debugging and troubleshooting Spark apps
  • Cases studies for production deployments of Spark
  • Preparation for Apache Spark developer certification exam

Instructor: Sameer Farooqui

Attendance is limited to 35 participants.


Since we will be covering advanced topics, it would be good for everyone to have at least a solid understanding of the fundamentals of Spark. Note that in class we will be mostly focused on core Spark architecture, how to write performant Spark code, using Spark SQL and Spark Streaming. We will not focus much on Machine Learning or MLlib, since this topic deserves a 3-day class on its own. However THE DATABRICKS 3-day class that you are signed up for will provide the right foundation for you to start getting into data science topics in MLlib after class completes.

Here are the resources I recommend all students to go through BEFORE the class starts (in order of importance):

While you are going through the material above, please write down any questions you have so we can address them in class (and so you remember to ask them).

Finally, remember to bring your Laptops (either Windows, OS X or linux) to class. You will not need Python, Java or Spark pre-installed as we will be running all labs in Databricks Cloud.

Please note:

Attendees to this training will be able to attend the Strata + Hadoop World evening networking events, plus have access to the Expo Hall

People who choose this Training are not able to attend any other sessions or tutorials at Strata + Hadoop World on Wednesday or Thursday. However, attendees may also purchase an exclusive Friday Pass for Strata + Hadoop World.

No discounts apply.

O’Reilly Radar

Tech insight, analysis, and research