Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Spark & Beyond conference sessions

A deep dive into an extremely popular big data framework: we’ll cover best practices, architectural considerations, and real-world case studies drawn from startups all the way to large enterprises.

Tuesday, September 29

9:00am–5:00pm Tuesday, 09/29/2015
Location: 1 E19/ 1 E 20/ 1 E21 Level: Intermediate
Anthony D. Joseph (UC Berkeley | Databricks)
Average rating: ***..
(3.32, 50 ratings)
Spark Camp provides a day long hands-on intro to the Spark platform including the core API, Spark SQL, Spark Streaming, MLlib, GraphX, and more. We will cover each Spark component through a series of technical talks targeted at developers who are new to Spark -- intermixed with hands-on lab work. Read more.
1:30pm–5:00pm Tuesday, 09/29/2015
Location: 3D 03/10 Level: Intermediate
Stephen O'Sullivan (Data Whisperers), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)
Average rating: ***..
(3.38, 24 ratings)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
1:30pm–5:00pm Tuesday, 09/29/2015
Location: 3D 06/07 Level: Intermediate
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Average rating: ***..
(3.88, 17 ratings)
Apache Drill is an open source distributed SQL engine for Hadoop, NoSQL databases, and other services. Drill's unique schema-free JSON data model enables self-service data exploration and analysis by eliminating the need to define/maintain schemas and transform data. This is a comprehensive hands-on tutorial that will enable you to start exploring and analyzing your data in place, wherever it is. Read more.

Wednesday, September 30

11:20am–12:00pm Wednesday, 09/30/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Patrick Wendell (Databricks)
Average rating: ***..
(3.86, 22 ratings)
In the last year Spark has seen substantial growth in adoption as well as the pace and scope of development. This talk will look forward and discuss both technical initiatives and the evolution of the Spark community. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Hossein Falaki (Databricks Inc.)
Average rating: ***..
(3.65, 26 ratings)
R is the favorite language of many data scientists. In addition to a language and runtime, R is a rich ecosystem of libraries for a wide range of use cases from statistical inference to data visualization. However, handling large or distributed data with R is challenging. Hence R is used along with other frameworks and languages by most data scientist. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: health
Timothy Danford (Tamr, Inc.)
Average rating: ***..
(3.52, 23 ratings)
A revolution in DNA sequencing technology has led to exponential growth in the genomics data available to discover new drugs, diagnose patients, and understand the fundamental biology of human disease. Existing bioinformatics tools will have difficulty scaling to meet the challenges posed by this growth. Learn about next-generation tools for bioinformatics and genomics using Spark and Parquet. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Location: 1 E20 / 1 E21 Level: Non-technical
Tags: iot
Hajkan Jonsson (Sony Mobile Communications)
Average rating: ***..
(3.33, 6 ratings)
In this talk we will show how Sony Mobile uses large scale analytics on Spark to generate insights to Lifelog users about themselves and the population, and how we use analytics to build a user lifecycle model that allows us to take actions toward increased user engagement and retention. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Holden Karau (Independent)
Average rating: ****.
(4.17, 18 ratings)
This session explores best practices of creating both unit and integration tests for Spark programs as well as acceptance tests for the data produced by our Spark jobs. We will explore the difficulties with testing streaming programs, options for setting up integration testing with Spark, and also examine best practices for acceptance tests. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Sandy Ryza (Clover Health)
Average rating: ***..
(3.80, 5 ratings)
How much can you expect to lose? The financial statistic Value at Risk seeks to answer this question, but is computationally intensive to estimate. At Cloudera, we’ve assisted several organizations in using Spark to compute VaR and other financial statistics. The talk, which walks through a basic VaR calculation, aims to give a feel for what it is like to approach financial modeling with Spark. Read more.

Thursday, October 1

11:20am–12:00pm Thursday, 10/01/2015
Location: 1 E20 / 1 E21 Level: Advanced
Tathagata Das (Databricks)
Average rating: ****.
(4.20, 15 ratings)
As the adoption of Spark Streaming in the industry is increasing, so is the community's demand for more features. Since the beginning of this year, we have made significant improvements in performance, usability, and semantic guarantees. In this talk, I discuss these improvements, as well as the features we plan to add in the near future. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: media, featured
Daniel Weeks (Netflix)
Average rating: ****.
(4.52, 23 ratings)
The Big Data Platform team at Netflix continues to push big data processing in the cloud with the addition of Spark to our platform. Recent enhancements to Spark allow us to effectively leverage it for processing against a 10+ petabyte warehouse backed by S3. We will share our experiences and performance of production jobs along with the pains and gains of deploying Spark at scale on YARN. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Christopher Nguyen (Arimo), Vu Pham (Adatao, Inc), Michael Bui (Adatao, Inc.)
Average rating: ****.
(4.14, 7 ratings)
Deep learning algorithms have been used in many real-world applications, such as computer vision, machine translation, and fraud detection. We'll present an overview of the system architecture, the training and running of Deep Learning models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) on Spark with Tachyon, including the use of GPUs to improve execution time. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Dean Wampler (Anyscale)
Average rating: ****.
(4.33, 6 ratings)
Apache Spark is often seen as a replacement for MapReduce in Hadoop systems, but Spark clusters can also be deployed and managed by Mesos. This talk explains how to use Mesos for Spark applications. We'll examine the pros and cons of using Mesos vs. Hadoop YARN as a data platform, and discuss practical issues when running Spark on Mesos. We'll even discuss how to combine the two with Myriad. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: media
Sridhar Alla (BlueWhale), Jan Neumann (Comcast)
Average rating: ***..
(3.67, 12 ratings)
Comcast uses Hadoop as the big data platform in several areas of its business. Their use cases have evolved in recent years and include personalization, clickthru analytics, modeling, and customer support initiatives, all adding up to billions of dollars in revenue. Read more.