Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Schedule: Big data and the cloud sessions

9:00am -5:00pm Monday, December 4 & Tuesday, December 5
Location: 335
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data, and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
9:00am12:30pm Tuesday, December 5, 2017
Location: 308/309
Vinithra Varadharajan (Cloudera), Philip Langdale (Cloudera), Jason Wang (Cloudera), Fahd Siddiqui (Cloudera)
Average rating: **...
(2.80, 5 ratings)
Vinithra Varadharajan, Philip Langdale, Jason Wang, and Fahd Siddiqui lead a deep dive into running data engineering workloads in a managed service capacity in the public cloud, highlighting cloud infrastructure best practices and illustrating how data engineering workloads interoperate with data analytic engines. Read more.
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 2
Yufeng Guo (Google)
Yufeng Guo demonstrates how to use TensorFlow to easily combine linear regression models and deep neural networks with a machine learning model that has the benefits of both. You'll also learn what is happening under the hood and how you can use this model for your own datasets. Read more.
4:15pm4:55pm Wednesday, December 6, 2017
Location: 310/311
Feng Cheng (Grab), Yanyu Qu (Grab)
Average rating: *****
(5.00, 1 rating)
Grab uses Presto to support operational reporting (batch and near real-time), ad hoc analyses, and its data pipeline. Currently, Grab has 5+ clusters with 100+ instances in production on AWS and serves up to 30K queries per day while supporting more than 200 internal data users. Feng Cheng and Yanyu Qu explain how Grab operationalizes Presto in the cloud and share lessons learned along the way. Read more.
5:05pm5:45pm Wednesday, December 6, 2017
Location: 310/311
Greg Rahn (Cloudera)
Average rating: ****.
(4.00, 1 rating)
Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance. Read more.
11:15am11:55am Thursday, December 7, 2017
Location: Summit 2
Wee Hyong Tok (Microsoft), Danielle Dean (iRobot)
Deep neural networks are responsible for many advances in natural language processing, computer vision, speech recognition, and forecasting. Danielle Dean and Wee Hyong Tok illustrate how cloud computing has been leveraged for exploration, programmatic training, real-time scoring, and batch scoring of deep learning models for projects in healthcare, manufacturing, and utilities. Read more.
12:05pm12:45pm Thursday, December 7, 2017
Location: 321/322
John Mertic (Linux Foundation), Cupid Chan (4C Decision )
Average rating: ***..
(3.67, 3 ratings)
John Mertic and Cupid Chan share real end-user perspectives from companies like GE on how they are using big data tools, challenges they face, and where they are looking to focus investments—all from a vendor-neutral viewpoint. Read more.
1:45pm2:25pm Thursday, December 7, 2017
Location: 310/311
Calvin Jia (Alluxio), Haoyuan Li (Alluxio)
Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors, considerations, and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform. Read more.
5:05pm5:45pm Thursday, December 7, 2017
Location: Summit 1
Le Zhang (Microsoft), Graham Williams (Microsoft)
Average rating: ***..
(3.00, 1 rating)
R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analytics—all within R. Read more.
5:05pm5:45pm Thursday, December 7, 2017
Location: 328/329
Arun Veettil (Skellam AI)
Arun Veettil shares his experience and lessons learned developing a customized, enterprise-level NLP platform to replace a leading text analytics vendor platform. Read more.