Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Schedule: Spark & beyond sessions

9:00–17:00 Wednesday, 1/06/2016
Location: Capital Suite 8/9
Sameer Farooqui (Databricks)
Average rating: ****.
(4.44, 16 ratings)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. Through hands-on examples, Sameer Farooqui explores various Wikipedia datasets to illustrate a variety of ideal programming paradigms. Read more.
9:00–12:30 Wednesday, 1/06/2016
Location: Capital Suite 12 Level: Intermediate
Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), Krishna Sankar (U.S.Bank)
Average rating: **...
(2.80, 15 ratings)
Jayant Shekhar, Vartika Singh, and Krishna Sankar explore techniques for building machine-learning apps using Spark ML as well as the principles of graph processing with Spark GraphX. Read more.
11:15–11:55 Thursday, 2/06/2016
Location: Capital Suite 13 Level: Non-technical
Tathagata Das (Databricks)
Average rating: ****.
(4.08, 12 ratings)
Spark 2.0 is a major milestone for the project. It achieves major advances in performance and introduces new initiatives to unify streaming processing with the Spark’s SQL engine. Tathagata Das explores these exciting new developments in Spark 2.0 as well as some other major initiatives that are coming in the future. Read more.
12:05–12:45 Thursday, 2/06/2016
Location: Capital Suite 13 Level: Intermediate
Vida Ha (Databricks), Prakash Chockalingam (Databricks)
Average rating: ***..
(3.59, 17 ratings)
So you’ve successfully tackled big data. Now let Vida Ha and Prakash Chockalingam help you take it real time and conquer fast data. Vida and Prakash cover the most common uses cases for streaming, important streaming design patterns, and the best practices for implementing them to achieve maximum throughput and performance of your system using Spark Streaming. Read more.
14:55–15:35 Thursday, 2/06/2016
Location: Capital Suite 13 Level: Intermediate
Ted Dunning (MapR)
Average rating: ****.
(4.38, 13 ratings)
Telecom operators need to find operational anomalies in their networks very quickly. Spark plus a streaming architecture can solve these problems very nicely. Ted Dunning presents a practical architecture as well as some detailed algorithms for detecting anomalies in event streams. These algorithms are simple and quite general and can be applied across a wide variety of situations. Read more.
16:35–17:15 Thursday, 2/06/2016
Location: Capital Suite 13 Level: Intermediate
Holden Karau (Independent)
Average rating: ****.
(4.78, 9 ratings)
Holden Karau walks attendees through a number of common mistakes that can keep your Spark programs from scaling and examines solutions and general techniques useful for moving beyond a proof of concept to production, covering topics like when to use DataFrames, tuning, and working with key skew. Read more.
17:25–18:05 Thursday, 2/06/2016
Location: Capital Suite 13 Level: Intermediate
Kostas Sakellis (Cloudera)
Average rating: ****.
(4.00, 4 ratings)
As Spark is used more and more frequently for production workloads with stringent security requirements, fully locking down Spark applications has become critical. Kostas Sakellis explores the various facets of securing your Spark application. Read more.
11:15–11:55 Friday, 3/06/2016
Location: Capital Suite 13 Level: Intermediate
Tags: real-time
Tathagata Das (Databricks)
Average rating: ****.
(4.38, 8 ratings)
Tathagata Das explains how Spark 2.x develops the next evolution of Spark Streaming by extending DataFrames and Datasets in Spark to handle streaming data. Streaming Datasets provides a single programming abstraction for batch and streaming data and also brings support for event-time-based processing, out-of-order data, sessionization, and tight integration with nonstreaming data sources. Read more.
12:05–12:45 Friday, 3/06/2016
Location: Capital Suite 13 Level: Intermediate
Luc Bourlier (Lightbend)
Average rating: **...
(2.00, 1 rating)
Reactive Streams is an API designed to connect reactive systems with back-pressure. Luc Bourlier explains why, with Spark Streaming now supporting back-pressure, Reactive Streams is the right tool to connect Spark Streaming in a reactive application. Read more.
14:05–14:45 Friday, 3/06/2016
Location: Capital Suite 13 Level: Intermediate
Ted Malaska (Capital One), Kai Voigt (Cloudera)
Average rating: ***..
(3.43, 7 ratings)
Ted Malaska leads an introduction to basic Spark concepts such as DAGs, RDDs, transformations, actions, and executors, designed for Java and Scala developers. You'll learn how your mindset must evolve beyond Java or Scala code that runs in a single JVM as you explore JVM locality, memory utilization, network/CPU usage, optimization of DAGs pipelines, and serialization conservation. Read more.
14:05–14:45 Friday, 3/06/2016
Location: Capital Suite 14 Level: Intermediate
Neeraja Rentachintala (MapR Technologies)
Average rating: *****
(5.00, 2 ratings)
Neeraja Rentachintala discusses the latest integrations between Apache Drill and Spark technologies. Together, the combination allows Spark users to leverage Drill’s flexible schema and dynamic schema discovery capabilities to query and work with complex data directly using familiar Spark programming paradigms. Read more.
16:35–17:15 Friday, 3/06/2016
Location: Capital Suite 13 Level: Intermediate
Neelesh Salian (Stitch Fix)
Average rating: **...
(2.43, 7 ratings)
Spark has been growing in deployments for the past year. The increasing amount of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine. Drawing on his experiences across 150+ production deployments, Neelesh Srinivas Salian explores common issues observed in a cluster environment setup with Apache Spark. Read more.