Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Data engineering sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26
Location: 1A 04/05
Secondary topics:  Architecture, Cloud
SOLD OUT
Bruce Martin (Cloudera)
Average rating: *....
(1.50, 2 ratings)
Bruce Martin leads you through designing and architecting solutions to a challenging business problem. You'll explore big data application architecture concepts in general and then apply them to the design of a challenging system. Read more.
Add to your personal schedule
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  ecommerce
Average rating: ***..
(3.00, 1 rating)
Neelesh Srinivas Salian offers an overview of the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem. Neelesh explains the development process and shares some lessons learned along the way. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 21/22 Level: Intermediate
Michelle Ufford (Netflix)
Average rating: ****.
(4.78, 9 ratings)
What if we used the wealth of data and experience at our disposal to drive improvements in data engineering? Michelle Ufford explains how Netflix is using data to find common patterns among the chaos that enable the company to automate repetitive and time-consuming tasks and discover ways to improve data quality, reduce costs, and quickly identify and respond to issues. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Geospatial, Logistics, Platform
Zhenxiao Luo (Uber), Wei Yan (Uber)
Average rating: ****.
(4.43, 7 ratings)
Uber's geospatial data is increasing exponentially as the company grows. As a result, its big data systems must also grow in scalability, reliability, and performance to support business decisions, user recommendations, and experiments for geospatial data. Zhenxiao Luo and Wei Yan explain how Uber runs geospatial analysis efficiently in its big data systems, including Hadoop, Hive, and Presto. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Architecture, IoT, Streaming
Michael Freedman (TimescaleDB | Princeton)
Average rating: ****.
(4.50, 4 ratings)
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture, Streaming
Paul Curtis (MapR Technologies)
Average rating: ****.
(4.67, 3 ratings)
A microservices architecture benefits from the agility of containers for convenient, predictable deployment of applications, while persistent, performant message streaming makes both work better. Paul Curtis explores these infrastructure components and discusses the design of highly scalable real-world systems that take advantage of this powerful triad. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Advanced
Secondary topics:  Architecture, Media, Platform
Barbara Eckman (Comcast)
Average rating: ***..
(3.00, 2 ratings)
Barbara Eckman offers an overview of Comcast’s streaming data platform, comprised of a variety of ingest, transformation, and storage services, which uses Apache Avro schemas to support end-to-end data governance, Apache Atlas for data discovery and lineage, and custom asynchronous messaging libraries to notify Atlas of new data and schema entities and lineage links as they are created. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 23/24 Level: Intermediate
Ihab Ilyas (University of Waterloo | Tamr)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Architecture, Financial services
Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
Average rating: *****
(5.00, 2 ratings)
Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1E 10/11 Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Kurt Brown (Netflix)
Average rating: ****.
(4.40, 5 ratings)
Kurt Brown explains how to get the most out of your data infrastructure with 20 principles and practices used at Netflix. Kurt covers each in detail and explores how they relate to the technologies used at Netflix, including S3, Spark, Presto, Druid, R, Python, and Jupyter. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 21/22 Level: Intermediate
Sneha Rao (Spotify), Joel Östlund (Spotify)
Spotify makes data-driven product decisions. As the company grows, the magnitude and complexity of the data it cares for the most is rapid increasing. Sneha Rao and Joel Östlund walk you through how Spotify stores and exposes audience data created from multiple internal producers within Spotify. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1E 07/08 Level: Intermediate
Gwen Shapira (Confluent)
Average rating: ***..
(3.33, 3 ratings)
There are many good reasons to run more than one Kafka cluster…and a few bad reasons too. Great architectures are driven by use cases, and multicluster deployments are no exception. Gwen Shapira offers an overview of several use cases, including real-time analytics and payment processing, that may require multicluster solutions, so you can better choose the right architecture for your needs. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Felix GV (LinkedIn), Yan Yan (LinkedIn)
Average rating: **...
(2.00, 1 rating)
Companies with batch and stream processing pipelines need to serve the insights they glean back to their users, an often-overlooked problem that can be hard to achieve reliably and at scale. Felix GV and Yan Yan offer an overview of Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency. Read more.