Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Media sessions

Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
Average rating: ****.
(4.50, 2 ratings)
The quality of online comments is critical to the Washington Post. However, the quality management of the comment section currently requires costly manual resources. Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 23/24 Level: Advanced
Barbara Eckman (Comcast)
Average rating: ***..
(3.00, 2 ratings)
Barbara Eckman offers an overview of Comcast’s streaming data platform, comprised of a variety of ingest, transformation, and storage services, which uses Apache Avro schemas to support end-to-end data governance, Apache Atlas for data discovery and lineage, and custom asynchronous messaging libraries to notify Atlas of new data and schema entities and lineage links as they are created. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Karthik Ramasamy (Streamlio), Supun Kamburugamuve (Indiana University)
Modern enterprises are data driven and want to move at light speed. To achieve real-time performance, financial applications use streaming infrastructures for low latency and high throughput. Twitter Heron is an open source streaming engine with low latency around 14 ms. Karthik Ramasamy and Supun Kamburugamuvee explain how they ported Heron to Infiniband to achieve latencies as low as 7 ms. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Machine Learning & Data Science, Spark & beyond
Location: 1A 08/10 Level: Advanced
Seth Hendrickson (Cloudera), DB Tsai (Netflix)
Average rating: *****
(5.00, 1 rating)
Recent developments in Spark MLlib have given users the power to express a wider class of ML models and decrease model training times via the use of custom parameter optimization algorithms. Seth Hendrickson and DB Tsai explain when and how to use this new API and walk you through creating your own Spark ML optimizer. Along the way, they also share performance benefits and real-world use cases. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Josh Baer (Spotify), Alison Gilles (Spotify)
Average rating: ****.
(4.00, 1 rating)
In early 2016, Spotify decided that it didn’t want to be in the data center business. The future was the cloud. Josh Baer and Alison Gilles explain what it took to move Spotify to the cloud, covering Spotify's technology choices, challenges faced, and the lessons Spotify learned along the way. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data-driven business management, Strata Business Summit
Location: 1A 18 Level: Intermediate
Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn)
Average rating: *****
(5.00, 1 rating)
Michael Li and Chi-Yi Kuan offer an overview of the EOI (enable-optimize-innovate) framework for big data analytics and explain how to leverage this framework to drive and grow business in key corporate functions, such as product, marketing, and sales. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data engineering, Strata Business Summit
Location: 1E 10/11 Level: Intermediate
Kurt Brown (Netflix)
Average rating: ****.
(4.40, 5 ratings)
Kurt Brown explains how to get the most out of your data infrastructure with 20 principles and practices used at Netflix. Kurt covers each in detail and explores how they relate to the technologies used at Netflix, including S3, Spark, Presto, Druid, R, Python, and Jupyter. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 21/22 Level: Intermediate
Andrew Otto (Wikimedia Foundation), Fangjin Yang (Imply)
The Wikimedia Foundation (WMF) is a nonprofit charitable organization. As the parent company of Wikipedia, one of the most visited websites in the world, WMF faces many unique challenges around its ecosystem of editors, readers, and content. Andrew Otto and Fangjin Yang explain how the WMF does analytics and offer an overview of the technology it uses to do so. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 23/24 Level: Intermediate
Felix GV (LinkedIn), Yan Yan (LinkedIn)
Average rating: **...
(2.00, 1 rating)
Companies with batch and stream processing pipelines need to serve the insights they glean back to their users, an often-overlooked problem that can be hard to achieve reliably and at scale. Felix GV and Yan Yan offer an overview of Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Data Engineering & Architecture, Law, ethics, governance
Location: 1A 01/02 Level: Intermediate
Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
Shirshanka Das and Tushar Shanbhag explore the big data ecosystem at LinkedIn and share its journey to preserve member privacy while providing data democracy. Shirshanka and Tushar focus on three foundational building blocks for scalable data management that can meet data compliance regulations: a central metadata system, an integrated data movement platform, and a unified data access layer. Read more.