San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Schedule: Media, Marketing, Advertising sessions

1:30pm–5:00pm Tuesday, March 26, 2019

The hitchhiker's guide to deep learning-based recommenders in production

Data Science, Machine Learning & AI
Location: 2002

Abhishek Kumar (Publicis Sapient), Pramod Singh (Walmart Labs )

Average rating:

(4.17, 6 ratings)

Abhishek Kumar and Pramod Singh walk you through deep learning-based recommender and personalization systems they've built for clients. Join in to learn how to use TensorFlow Serving and MLflow for end-to-end productionalization, including model serving, Dockerization, reproducibility, and experimentation, and Kubernetes for deployment and orchestration of ML-based microarchitectures. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Recommendation engines and mobile gaming

Case studies, Strata Business Summit
Location: 2024

Bysshe Easton (KIXEYE), Thomas Dobbs (KIXEYE)

Average rating:

(4.50, 2 ratings)

As a fully closed model economy, games offer a unique opportunity to use analytics to create unique purchase opportunities for customers. Bysshe Easton and Thomas Dobbs explain how KIXEYE uses machine learning to create personalized offer recommendations for its customers, resulting in significantly increased monetization and retention. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Scaling data lineage at Netflix to improve data infrastructure reliability and efficiency

Data Engineering & Architecture
Location: 2001

Jitender Aswani (Netflix), Di Lin (Netflix), Girish Lingappa (Netflix)

Average rating:

(3.40, 15 ratings)

Hundreds of thousands of ETL pipelines ingest over a trillion events daily to populate millions of data tables downstream at Netflix. Jitender Aswani, Girish Lingappa, and Di Lin discuss Netflix’s internal data lineage service, which was essential for enhancing platform’s reliability, increasing trust in data, and improving data infrastructure efficiency. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

The journey toward a self-service data platform at Netflix

Data Engineering & Architecture
Location: 2002

Kurt Brown (Netflix)

Average rating:

(4.22, 9 ratings)

The Netflix data platform is a massive-scale, cloud-only suite of tools and technologies. It includes big data tech (Spark and Flink), enabling services (federated metadata management), and machine learning support. But with power comes complexity. Kurt Brown explains how Netflix is working toward an easier, "self-service" data platform without sacrificing any enabling capabilities. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Artificial intelligence on human behavior: New insights into customer segmentation

Data Science, Machine Learning & AI
Location: 2016

Melinda Han Williams (Dstillery)

Average rating:

(4.86, 14 ratings)

Customer segmentation based on coarse survey data is a staple of traditional market research. Melinda Han Williams explains how Dstillery uses neural networks to model the digital pathways of 100M consumers and uses the resulting embedding space to cluster customer populations into fine-grained behavioral segments and inform smarter consumer insights—in the process, creating a map of the internet. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Applying deep learning at Google for recommendations

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Ron Bodkin (Google)

Average rating:

(4.33, 6 ratings)

Google uses deep learning extensively in new and existing products. Join Ron Bodkin to learn how Google has used deep learning for recommendations at YouTube, in the Play store, and for customers in Google Cloud. You'll explore the role of embeddings, recurrent networks, contextual variables, and wide and deep learning and discover how to do candidate generation and ranking with deep learning. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Purchase, play, and upgrade data for video game players

Case studies, Strata Business Summit
Location: 2024

Eric Bradlow (The Wharton School), Zachery Anderson (Electronic Arts)

Average rating:

(3.00, 1 rating)

Eric Bradlow and Zachery Anderson discuss the Wharton Customer Analytics Initiative research opportunity process and explain how some of EA’s solved some of its business problems by sharing its data with 11 teams of researchers from around the world. Read more.

11:00am–11:40am Thursday, March 28, 2019

Creating a bionic newsroom

Data Science, Machine Learning & AI
Location: 2009

Boris Yakubchik (Forbes), Salah Zalatimo (Forbes)

Average rating:

(4.50, 2 ratings)

Boris Yakubchik and Salah Zalatimo offer an overview of Bertie, Forbes's new publishing platform—an AI assistant that learns from writers and suggests improvements—and detail Bertie’s features, architecture, and ultimate goals, paying special attention to how the company implemented an ensemble of machine learning models that, together, make up the AI assistant's skill set and personality. Read more.

11:00am–11:40am Thursday, March 28, 2019

How Zhaopin.com built its enterprise event bus using Apache Pulsar

Data Engineering & Architecture, Streaming and IoT
Location: 2006

Sijie Guo (StreamNative), Penghui Li (Zhaopin)

Average rating:

(4.00, 1 rating)

Using a messaging system to build an event bus is very common. However, certain use cases demand a messaging system with a certain set of features. Sijie Guo and Penghui Li discuss the event bus requirements for Zhaopin.com, one of China's biggest online recruitment services providers, and explain why the company chose Apache Pulsar. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Scanner: Efficient video analysis at scale

Data Engineering & Architecture
Location: 2008

Fait Poms (Stanford University), Will Crichton (Stanford University)

Average rating:

(4.75, 4 ratings)

Video is now the largest source of data on the internet, so we need tools to make it easier to process and analyze. Alex Poms and Will Crichton offer an overview of Scanner, the first open source distributed system for building large-scale video processing applications, and explore real-world use cases. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Infinite segmentation: Scalable mutual information ranking on real-world graphs

Data Science, Machine Learning & AI
Location: 2011

Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)

Average rating:

(4.50, 2 ratings)

Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks. Read more.

11:50am–12:30pm Thursday, March 28, 2019

How Netflix measures app performance on 250 million unique devices across 190 countries

Data Engineering & Architecture
Location: 2006

Vivek Pasari (Netflix), Jitender Aswani (Netflix)

Average rating:

(3.14, 7 ratings)

Netflix has over 125 million members spread across 191 countries. Each day its members interact with its client applications on 250 million+ devices under highly variable network conditions. These interactions result in over 200 billion daily data points. Vivek Pasari dives into the data engineering and architecture that enables application performance measurement at this scale. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

Building and scaling a security detection platform: A Netflix Original

Data Engineering & Architecture
Location: 2024

John Bennett (Netflix), Siamac Mirzaie (Netflix)

Average rating:

(3.33, 3 ratings)

Data has become a foundational pillar for security teams operating in organizations of all shapes and sizes. This new norm has created a need for platforms that enable engineers to harness data for various security purposes. John Bennett and Siamac Mirzaie offer an overview of Netflix's internal platform for quickly deploying data-based detection capabilities in the corporate environment. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

Testing ad content with survey experiments

Data Science, Machine Learning & AI
Location: 2010

Patrick Miller (Civis Analytics)

Average rating:

(3.40, 5 ratings)

Brands that test the content of ads before they are shown to an audience can avoid spending resources on the 11% of ads that cause backlash. Using a survey experiment to choose the best ad typically improves effectiveness of marketing campaigns by 13% on average, and up to 37% for particular demographics. Patrick Miller explores data collection and statistical methods for analysis and reporting. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

The Paradise Papers and West Africa Leaks: Behind the scenes with the ICIJ

Business Analytics and Visualization, Strata Business Summit
Location: 2018

Pierre Romera (International Consortium of Investigative Journalists (ICIJ))

Average rating:

(4.67, 6 ratings)

The ICIJ was the team behind the Panama Papers and Paradise Papers. Pierre Romera offers a behind-the-scenes look into the ICIJ's process and explores the challenges in handling 1.4 TB of data (in many different formats)—and making it available securely to journalists all over the world. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Taming large state to join datasets for personalization

Data Engineering & Architecture
Location: 2002

Sonali Sharma (Netflix), Shriya Arora (Netflix)

Average rating:

(3.00, 2 ratings)

With so much data being generated in real time, what if we could combine all these high-volume data streams and provide near real-time feedback for model training, improving personalization and recommendations and taking the customer experience to a whole new level. Sonali Sharma and Shriya Arora explain how to do exactly that, using Flink's keyed state. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Efficient multi-armed bandit with Thompson sampling for applications with delayed feedback

Data Science, Machine Learning & AI
Location: 2010

Shradha Agrawal (Adobe)

Average rating:

(4.17, 6 ratings)

Decision making often struggles with the exploration-exploitation dilemma. Multi-armed bandits (MAB) are a popular reinforcement learning solution, but increasing the number of decision criteria leads to an exponential blowup in complexity, and observational delays don’t allow for optimal performance. Shradha Agrawal offers an overview of MABs and explains how to overcome the above challenges. Read more.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com