Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Schedule: Spark and beyond sessions

Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 321/322 Level: Intermediate
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 1 Level: Intermediate
Holden Karau (Google)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau introduces Spark’s ML pipelines and explains how to extend them with your own custom algorithms, allowing you to take advantage of Spark's meta-algorithms and existing ML tools. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, December 6, 2017
Location: 328/329 Level: Non-technical
John Akred (Silicon Valley Data Science)
AI is white-hot at the moment, but where can it really be used? Developers are usually the first to understand why some technologies cause more excitement than others. John Akred relates this insider knowledge, providing a tour through the hottest emerging data technologies of 2017 to explain why they’re exciting in terms of both new capabilities and the new economies they bring. Read more.
Add to your personal schedule
5:05pm5:45pm Wednesday, December 6, 2017
Location: Summit 1 Level: Intermediate
Peng Meng (Intel)
Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.com’s production environment. Read more.
Add to your personal schedule
11:15am11:55am Thursday, December 7, 2017
Location: 310/311 Level: Intermediate
Holden Karau (Google), Joey Echeverria (Rocana)
Apache Spark offers greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark, and more. Read more.
Add to your personal schedule
12:05pm12:45pm Thursday, December 7, 2017
Location: 310/311 Level: Intermediate
Carson Wang (Intel), Yucai Yu (Intel)
Spark SQL is one of the most popular components of Apache Spark. Carson Wang and Yucai Yu explore Intel's efforts to improve SQL performance and offer an overview of an adaptive execution mode they implemented for Spark SQL. Read more.
Add to your personal schedule
1:45pm2:25pm Thursday, December 7, 2017
Location: 308/309 Level: Advanced
Apache Beam allows data pipelines to work in batch, streaming, and a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam's programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages. Read more.