Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Big data and the Cloud sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26
Location: 1A 03
Secondary topics:  Architecture, Cloud, Streaming
SOLD OUT
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
Add to your personal schedule
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 23/24 Level: Beginner
Secondary topics:  Cloud
Pranav Rastogi (Microsoft)
Average rating: **...
(2.50, 2 ratings)
As big data solutions are rapidly moving to the cloud, it's becoming increasingly important to know how to use Apache Hadoop, Spark, R Server, and other open source technologies in the cloud. Pranav Rastogi walks you through building big data applications on Azure HDInsight and other Azure services. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 10 Level: Intermediate
Secondary topics:  Architecture, Cloud
Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera)
Average rating: *....
(1.50, 2 ratings)
Jennifer Wu, Paul George, Fahd Siddiqui, and Eugene Fratkin lead a deep dive into running data engineering workloads in a managed service capacity in the public cloud. Along the way, they share AWS infrastructure best practices and explain how data engineering workloads interoperate with data analytic workloads. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Architecture, Cloud
Ryan Nienhuis (Amazon Web Services), Radhika Ravirala (Amazon Web Services (AWS)), Allan MacInnis (Amazon Web Services), Ben Snively (Amazon Web Services (AWS))
Average rating: ****.
(4.00, 2 ratings)
Want to learn how to use Amazon's big data web services to launch your first big data application on the cloud? Ryan Nienhuis, Radhika Ravirala, Allan MacInnis, and Ben Snively walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Advanced
Secondary topics:  Financial services, Platform
Average rating: ****.
(4.57, 7 ratings)
John Hitchingham shares insights into the design and operation of FINRA's data lake in the AWS cloud, where FINRA extracts, transforms, and loads over 75B transactions per day. Users can query across petabytes of data in seconds on AWS S3 using Presto and Spark—all while maintaining security and data lineage. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture, Cloud
Henry Robinson (Cloudera), Greg Rahn (Cloudera)
Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Jun Rao (Confluent)
Average rating: *****
(5.00, 3 ratings)
Over the last few years, streaming platform Apache Kafka has been used extensively for real-time data collecting, delivering, and processing—particularly in the enterprise. Jun Rao leads a deep dive into some of the key internals that help make Kafka popular and provide strong reliability guarantees. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud, Media, Platform
Josh Baer (Spotify), Alison Gilles (Spotify)
Average rating: ****.
(4.00, 1 rating)
In early 2016, Spotify decided that it didn’t want to be in the data center business. The future was the cloud. Josh Baer and Alison Gilles explain what it took to move Spotify to the cloud, covering Spotify's technology choices, challenges faced, and the lessons Spotify learned along the way. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud
Chris Mills (The Meet Group)
if(we)'s batch event processing pipeline is different from yours, but the process of migrating it from running in a data center to running in AWS is likely pretty similar. Chris Mills explains what was easier than expected, what was harder, and what the company wished it had known before starting the migration. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Cloud
Stephen Wu (Microsoft)
Average rating: ****.
(4.00, 1 rating)
Remote storage in the cloud provides an infinitely scalable, cost-effective, and performant solution for big data customers. Adoption is rapid due to the flexibility and cost savings associated with unlimited storage capacity when separating compute and storage. Stephen Wu demonstrates how to correctly performance tune your workloads when your data is stored in remote storage in the cloud. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Cloud, R
Edgar Ruiz (RStudio)
Average rating: ****.
(4.00, 1 rating)
With R and sparklyr, a Spark standalone cluster can be used to analyze large datasets found in S3 buckets. Edgar Ruiz walks you through setting up a Spark standalone cluster using EC2 and offers an overview of S3 bucket folder and file setup, connecting R to Spark, the settings needed to read S3 data into Spark, and a data import and wrangle approach. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud
Bill Havanki (Cloudera)
Speed and reliability in deploying big data clusters is key for effectiveness in the cloud. Drawing on ideas from his book Moving Hadoop to the Cloud, which covers essential practices like baking images and automating cluster configuration, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Data for good, Media, Platform
Andrew Otto (Wikimedia Foundation), Fangjin Yang (Imply)
The Wikimedia Foundation (WMF) is a nonprofit charitable organization. As the parent company of Wikipedia, one of the most visited websites in the world, WMF faces many unique challenges around its ecosystem of editors, readers, and content. Andrew Otto and Fangjin Yang explain how the WMF does analytics and offer an overview of the technology it uses to do so. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Beginner
Secondary topics:  Cloud
Michael McCune (Red Hat)
Average rating: *****
(5.00, 2 ratings)
Notebook interfaces like Apache Zeppelin and Project Jupyter are excellent starting points for sketching out ideas and exploring data-driven algorithms, but where does the process lead after the notebook work has been completed? Michael McCune offers some answers as they relate to cloud-native platforms. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture
Jennifer Wu (Cloudera), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads. Jennifer Wu, Philip Langdale, and Kostas Sakellis explore the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Tim Berglund (Confluent)
Average rating: **...
(2.50, 2 ratings)
Tim Berglund offers a thorough introduction to the Streams API, an important recent addition to Kafka that lets us build sophisticated stream processing systems that are as scalable and fault tolerant as Kafka itself—and also happen to align quite well with the microservices sensibilities that are so common in contemporary architectural thinking. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Healthcare
Jon Fuller (KNIME), Olivia Klose (Microsoft)
Average rating: ***..
(3.00, 1 rating)
Jon Fuller and Olivia Klose explain how KNIME, Apache Spark, and Microsoft Azure enable fast and cheap automated classification of malignant lymphoma type in digital pathology images. The trained model is deployed to end users as a web application using the KNIME WebPortal. Read more.