Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Retail and e-commerce sessions

11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Mikio Braun (Zalando SE)
Average rating: ****.
(4.86, 7 ratings)
Time series data has many applications in industry, from analyzing server metrics to monitoring IoT signals and outlier detection. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn’t, and industry use cases. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Longqi Yang (Cornell Tech, Cornell University)
State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Non-technical
Erin Coffman (Airbnb)
Average rating: *****
(5.00, 7 ratings)
Airbnb has open-sourced many high-leverage data tools, including Airflow, Superset, and the Knowledge Repo, but adoption of these tools across the company was relatively low. Erin Coffman offers an overview of Data University, launched to make data more accessible and utilized in decision making at Airbnb. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Intermediate
Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )
Average rating: *****
(5.00, 3 ratings)
Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)
Average rating: *****
(5.00, 3 ratings)
Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Intermediate
Varant Zanoyan (Airbnb)
Average rating: ****.
(4.33, 6 ratings)
Zipline is Airbnb’s soon to be open-sourced data management platform specifically designed for ML use cases. It has taken the task of feature generation from months to days and offers features to support end-to-end data management for machine learning. Varant Zanoyan covers Zipline's architecture and dives into how it solves ML-specific problems. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)
Average rating: *****
(5.00, 1 rating)
Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Non-technical
Average rating: ****.
(4.75, 4 ratings)
Data scientists are hard to hire. But too often, companies struggle to find the right talent only to make avoidable mistakes that cause their best data scientists to leave. From org structure and leadership to tooling, infrastructure, and more, Michelangelo D'Agostino shares concrete (and inexpensive) tips for keeping your data scientists engaged, productive, and adding business value. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 14 Level: Intermediate
Mikio Braun (Zalando SE)
Average rating: **...
(2.75, 4 ratings)
In order to become "AI ready," an organization not only has to provide the right technical infrastructure for data collection and processing but also must learn new skills. Mikio Braun highlights three pieces companies often miss when trying to become AI ready: making the connection between business problems and AI technology, implementing AI-driven development, and running AI-based projects. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 09 Level: Beginner
tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)
Average rating: ***..
(3.00, 1 rating)
Tao Huang, Mang Zhang, and 白冰 explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Non-technical
Francesco Mucio (Zalando)
Average rating: ***..
(3.50, 2 ratings)
Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.