Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Schedule: Spark and beyond sessions

1:30pm5:00pm Tuesday, December 5, 2017
Location: 321/322
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 1
Holden Karau (Independent)
Average rating: ****.
(4.50, 6 ratings)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau introduces Spark’s ML pipelines and explains how to extend them with your own custom algorithms, allowing you to take advantage of Spark's meta-algorithms and existing ML tools. Read more.
4:15pm4:55pm Wednesday, December 6, 2017
Location: 328/329
John Akred (Silicon Valley Data Science)
AI is white-hot at the moment, but where can it really be used? Developers are usually the first to understand why some technologies cause more excitement than others. John Akred relates this insider knowledge, providing a tour through the hottest emerging data technologies of 2017 to explain why they’re exciting in terms of both new capabilities and the new economies they bring. Read more.
5:05pm5:45pm Wednesday, December 6, 2017
Location: Summit 1
Peng Meng (Intel)
Average rating: *....
(1.00, 1 rating)
Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in’s production environment. Read more.
11:15am11:55am Thursday, December 7, 2017
Location: 310/311
Holden Karau (Independent), Joey Echeverria (Rocana)
Average rating: ****.
(4.20, 5 ratings)
Apache Spark offers greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark, and more. Read more.
12:05pm12:45pm Thursday, December 7, 2017
Location: 310/311
Carson Wang (Intel), Yucai Yu (Intel)
Average rating: ****.
(4.50, 2 ratings)
Spark SQL is one of the most popular components of Apache Spark. Carson Wang and Yucai Yu explore Intel's efforts to improve SQL performance and offer an overview of an adaptive execution mode they implemented for Spark SQL. Read more.
1:45pm2:25pm Thursday, December 7, 2017
Location: 308/309
Average rating: *****
(5.00, 1 rating)
Apache Beam allows data pipelines to work in batch, streaming, and a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam's programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages. Read more.