Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Schedule: Spark & beyond sessions

9:00am–12:30pm Tuesday, 09/27/2016
Location: 1 E 07/1 E 08 Level: Intermediate
Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.)
Average rating: ***..
(3.11, 19 ratings)
Vartika Singh and Jayant Shekhar walk you through techniques for building and tuning machine-learning apps using Spark MLlib and Spark ML Pipelines and graph processing with GraphX. Read more.
9:00am–5:00pm Tuesday, 09/27/2016
Location: Hall 1B
Zoltan Toth (datapao.com)
Average rating: **...
(2.90, 10 ratings)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. Through hands-on examples, Zoltan Toth explores various Wikipedia datasets to illustrate a variety of ideal programming paradigms. Read more.
9:00am–12:30pm Tuesday, 09/27/2016
Location: 1 E 15/1 E 16 Level: Intermediate
Dean Wampler (Lightbend)
Average rating: *****
(5.00, 4 ratings)
Apache Spark is written in Scala. Hence, many if not most data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs. Read more.
9:00am–12:30pm Tuesday, 09/27/2016
Location: Hall 1C Level: Intermediate
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers), Mauricio Vacas (Silicon Valley Data Science)
Average rating: ***..
(3.07, 15 ratings)
What are the essential components of a data platform? John Akred, Mauricio Vacas, and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
11:20am–12:00pm Wednesday, 09/28/2016
Location: Hall 1B Level: Beginner
Ram Sriharsha (Databricks)
Average rating: **...
(2.88, 17 ratings)
Ram Sriharsha reviews major developments in Apache Spark 2.0 and discusses future directions for the project to make Spark faster and easier to use for a wider array of workloads, with an emphasis on API evolution, single-node performance (Project Tungsten Phase 3), and Structured Streaming. Read more.
1:15pm–1:55pm Wednesday, 09/28/2016
Location: Hall 1B Level: Advanced
Ted Malaska (Capital One), Mark Grover (Lyft)
Average rating: ***..
(3.92, 12 ratings)
Ted Malaska and Mark Grover cover the top five things that prevent Spark developers from getting the most out of their Spark clusters. When these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters and the same data, using just a different approach. Read more.
2:05pm–2:45pm Wednesday, 09/28/2016
Location: Hall 1B Level: Intermediate
Average rating: ***..
(3.75, 4 ratings)
Spark's efficiency and speed can help reduce the TCO of existing clusters. This is because Spark's performance advantages allow it to complete processing in drastically shorter batch windows with higher performance per dollar. Raj Krishnamurthy offers a detailed walk-through of an alternating least squares-based matrix factorization workload able to improve runtimes by a factor of 2.22. Read more.
4:35pm–5:15pm Wednesday, 09/28/2016
Location: Hall 1B Level: Intermediate
Francois Garillot (Swisscom)
Average rating: ***..
(3.75, 4 ratings)
Swisscom, the leading mobile service provider in Switzerland, also provides data-driven intelligence through the analysis of its mobile network. Its Mobility Insights team works to help administrators understand the flow of people through their location of interest. François Garillot explores the platform, tooling, and choices that help achieve this service and some challenges the team has faced. Read more.
5:25pm–6:05pm Wednesday, 09/28/2016
Location: Hall 1B Level: Intermediate
Neelesh Salian (Stitch Fix)
Average rating: ***..
(3.00, 7 ratings)
Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian focuses on five common issues observed in a cluster environment setup with Apache Spark (Core, Streaming, and SQL) to help you improve the usability and supportability of Apache Spark and avoid such issues in future deployments. Read more.
11:20am–12:00pm Thursday, 09/29/2016
Location: Hall 1B Level: Beginner
Tags: real-time
Ram Sriharsha (Databricks)
Average rating: ***..
(3.25, 8 ratings)
Structured Streaming is a new effort in Apache Spark to make stream processing simple without the need to learn a new programming paradigm or system. Ram Sriharsha offers an overview of Structured Streaming, discussing its support for event-time, out-of-order/delayed data, sessionization, and integration with the batch data stack to show how it simplifies building powerful continuous applications. Read more.
1:15pm–1:55pm Thursday, 09/29/2016
Location: Hall 1B Level: Beginner
Yuhao Yang (Intel)
Average rating: ****.
(4.00, 5 ratings)
Through collaboration with some of the top payments companies around the world, Intel has developed an end-to-end solution for building fraud detection applications. Yuhao Yang explains how Intel used and extended Spark DataFrames and ML Pipelines to build the tool chain for financial fraud detection and shares the lessons learned during development. Read more.
2:05pm–2:45pm Thursday, 09/29/2016
Location: Hall 1B Level: Intermediate
Holden Karau (Google), Seth Hendrickson (Cloudera)
Holden Karau and Seth Hendrickson demonstrate how to do streaming machine learning using Spark's new Structured Streaming and walk you through creating your own streaming model. Read more.
2:55pm–3:35pm Thursday, 09/29/2016
Location: Hall 1B Level: Intermediate
Narasimhan Sampath (Choice Hotels International), Avinash Ramineni (Clairvoyant)
Average rating: ****.
(4.00, 1 rating)
Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement. Read more.
4:35pm–5:15pm Thursday, 09/29/2016
Location: Hall 1B Level: Intermediate
Tags: real-time
Jesse Anderson (Big Data Institute)
Average rating: *****
(5.00, 2 ratings)
Although Spark gets a lot of attention, we only think about two languages being supported—Python and Scala. Jesse Anderson proves that Java works just as well. With lambdas, we even get syntax comparable to Scala, so Java developers get the best of both worlds without having to learn Scala. Read more.