Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference
Singapore

Schedule: Spark & beyond sessions

A deep dive into an extremely popular big data framework: we’ll cover best practices, architectural considerations, and real-world case studies drawn from startups to large enterprises.

9:00am–5:00pm Tuesday, December 6, 2016
Location: 328/329
Tags: real-time
Sameer Farooqui (Databricks)
Average rating: *****
(5.00, 1 rating)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. Through hands-on examples, Sameer Farooqui explores various Wikipedia datasets to illustrate a variety of ideal programming paradigms. Read more.
9:00am–12:30pm Tuesday, December 6, 2016
Location: 321/322 Level: Intermediate
Dean Wampler (Lightbend)
Average rating: ****.
(4.00, 1 rating)
Apache Spark is written in Scala. Hence, many—if not most—data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs. Read more.
9:00am–12:30pm Tuesday, December 6, 2016
Location: 323 Level: Intermediate
Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.)
Average rating: ***..
(3.00, 7 ratings)
Vartika Singh and Jayant Shekhar offers a hands-on tutorial that exposes you to techniques for building and tuning machine-learning apps using Spark ML libraries, building pipelines, tuning parameters, and graph processing with GraphX. Read more.
11:15am–11:55am Wednesday, December 7, 2016
Location: Summit 1 Level: Beginner
Xueyan Li (Qunar), Chunming Li (Garena)
Average rating: *....
(1.00, 2 ratings)
Real-time data analysis is becoming more and more important to Internet companies’ daily business. Qunar has been running Alluxio in production for over a year. Lei Xu explores how stream processing on Alluxio has led to a 16x performance improvement on average and 300x improvement at service peak time on workloads at Qunar. Read more.
11:15am–11:55am Wednesday, December 7, 2016
Location: Summit 2 Level: Intermediate
Ted Malaska (Capital One), Mark Grover (Lyft)
Average rating: ****.
(4.12, 8 ratings)
Ted Malaska and Mark Grover cover the top five things that prevent Spark developers from getting the most out of their Spark clusters. When these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters and the same data, using just a different approach. Read more.
11:15am–11:55am Wednesday, December 7, 2016
Location: 321/322 Level: Non-technical
John Akred (Silicon Valley Data Science)
Average rating: ***..
(3.22, 9 ratings)
Spark is white-hot, but why does it matter? Some technologies cause more excitement than others, and at first the only people who understand why are the developers who use them. John Akred offers a tour through the hottest emerging data technologies of 2016 and explains why they’re exciting, in the context of the new capabilities and economies they bring. Read more.
12:05pm–12:45pm Wednesday, December 7, 2016
Location: Summit 2 Level: Advanced
Dean Wampler (Lightbend)
Average rating: ***..
(3.80, 5 ratings)
The success of Apache Spark is bringing developers to Scala. For big data, the JVM uses memory inefficiently, causing significant GC challenges. Spark's Project Tungsten fixes these problems with custom data layouts and code generation. Dean Wampler gives an overview of Spark, explaining ongoing improvements and what we should do to improve Scala and the JVM for big data. Read more.
2:35pm–3:15pm Wednesday, December 7, 2016
Location: Summit 2 Level: Beginner
Jiri Simsa (Alluxio)
Average rating: ****.
(4.50, 2 ratings)
Alluxio is an open source memory-speed virtual distributed storage system. In the past year, the Alluxio open source community has grown to more than 300 developers. The project also experienced a tremendous improvement in performance and scalability and was extended with new features. Haoyuan Li offers an overview of Alluxio, covering its use cases, its community, and the value it brings. Read more.
5:05pm–5:45pm Wednesday, December 7, 2016
Location: 321/322 Level: Advanced
Average rating: *....
(1.00, 1 rating)
Creating big data solutions that can process data at terabyte scale and produce spatial-temporal real-time insights at speed demands a well-thought-through system architecture. Chandras Sekhar Saripaka details the production architecture at DataSpark that works through terabytes of spatial-temporal telco data each day in PaaS mode and showcases how DataSpark operates in SaaS mode. Read more.
12:05pm–12:45pm Thursday, December 8, 2016
Location: 321/322 Level: Intermediate
Tags: streaming
Holden Karau (Independent), Seth Hendrickson (Cloudera)
Average rating: ****.
(4.67, 6 ratings)
Holden Karau and Seth Hendrickson demonstrate how to do streaming machine learning using Spark's new Structured Streaming and walk you through creating your own streaming model. Read more.
1:45pm–2:25pm Thursday, December 8, 2016
Location: Summit 2 Level: Intermediate
Andrea Gagliardi La Gala (Microsoft), Brandon Lee (Mediacorp)
Average rating: ****.
(4.50, 2 ratings)
Mediacorp analyzes its online audience through a computationally and economically efficient cloud-based platform. The cornerstone of the platform is Apache Spark, a framework whose clean APIs and performance gains make it an ideal choice for data scientists. Andrea Gagliardi La Gala and Brandon Lee highlight the platform’s architecture, benefits, and considerations for deploying it in production. Read more.
1:45pm–2:25pm Thursday, December 8, 2016
Location: 321/322 Level: Beginner
Vinay Shukla (Hortonworks)
With enterprise adoption of Apache Spark come enterprise security requirements and the need to meet enterprise security standards. Vinay Shukla walks you through enterprise security requirements, provides a deep dive into Spark security features, and shows how Spark meets these enterprise security requirements. Read more.