Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Retail and e-commerce sessions

11:20am–12:00pm Wednesday, 09/12/2018

Machine learning for time series: What works and what doesn't

Location: 1A 15/16 Level: Intermediate

Mikio Braun (Zalando)

Average rating:

(4.86, 7 ratings)

Time series data has many applications in industry, from analyzing server metrics to monitoring IoT signals and outlier detection. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn’t, and industry use cases. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Harnessing and customizing state-of-the-art recommendation solutions with OpenRec

Location: 1A 15/16 Level: Intermediate

Longqi Yang (Cornell Tech, Cornell University)

State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Data University: How Airbnb democratized data

Location: 1E 10/11 Level: Non-technical

Erin Coffman (Airbnb)

Average rating:

(5.00, 7 ratings)

Airbnb has open-sourced many high-leverage data tools, including Airflow, Superset, and the Knowledge Repo, but adoption of these tools across the company was relatively low. Erin Coffman offers an overview of Data University, launched to make data more accessible and utilized in decision making at Airbnb. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Continuous machine learning over streaming data: The story continues.

Location: 1A 12/14 Level: Intermediate

Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )

Average rating:

(5.00, 3 ratings)

Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Bighead: Airbnb's end-to-end machine learning platform

Location: 1A 08 Level: Beginner

Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)

Average rating:

(5.00, 3 ratings)

Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Zipline: Airbnb's data management platform for machine learning

Location: 1A 21/22 Level: Intermediate

Varant Zanoyan (Airbnb)

Average rating:

(4.33, 6 ratings)

Zipline is Airbnb’s soon to be open-sourced data management platform specifically designed for ML use cases. It has taken the task of feature generation from months to days and offers features to support end-to-end data management for machine learning. Varant Zanoyan covers Zipline's architecture and dives into how it solves ML-specific problems. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

When Tiramisu meets online fashion retail

Location: 1A 15/16 Level: Intermediate

Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)

Average rating:

(5.00, 1 rating)

Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.

11:20am–12:00pm Thursday, 09/13/2018

The care and feeding of data scientists: Concrete tips for retaining your data science team

Location: 1E 10/11 Level: Non-technical

Michelangelo D'Agostino (ShopRunner)

Average rating:

(4.75, 4 ratings)

Data scientists are hard to hire. But too often, companies struggle to find the right talent only to make avoidable mistakes that cause their best data scientists to leave. From org structure and leadership to tooling, infrastructure, and more, Michelangelo D'Agostino shares concrete (and inexpensive) tips for keeping your data scientists engaged, productive, and adding business value. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Executive Briefing: From Business to AI—The missing pieces in becoming "AI ready"

Location: 1E 14 Level: Intermediate

Mikio Braun (Zalando)

Average rating:

(2.75, 4 ratings)

In order to become "AI ready," an organization not only has to provide the right technical infrastructure for data collection and processing but also must learn new skills. Mikio Braun highlights three pieces companies often miss when trying to become AI ready: making the connection between business problems and AI technology, implementing AI-driven development, and running AI-based projects. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

Location: 1E 09 Level: Beginner

tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)

Average rating:

(3.00, 1 rating)

Tao Huang, Mang Zhang, and 白冰 explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Scaling data infrastructure in the fashion world; or, “What is this? Business intelligence for ants?”

Location: 1E 10/11 Level: Non-technical

Francesco Mucio (Francescomuc.io)

Average rating:

(3.50, 2 ratings)

Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com