Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Schedule: Data Platforms sessions

Over the last few years, many companies have begun rolling out data platforms for business intelligence and business analytics. More recently companies have started to expand towards platforms that can support growing teams of data scientists. Common features of modern data science platforms include: support for notebooks and open source machine learning libraries, project management (collaboration and reproducibility), and model visualization.

Add to your personal schedule
11:00am11:40am Wednesday, March 27, 2019
JIAN CHANG (Alibaba Group), Sanjian Chen (Alibaba Group)
Average rating: ****.
(4.50, 4 ratings)
Jian Chang and Sanjian Chen outline the design of the AI engine on Alibaba's TSDB service, which enables fast and complex analytics of large-scale retail data. They then share a successful case study of the Fresh Hema Supermarket, a major “new retail” platform operated by Alibaba Group, highlighting solutions to the major technical challenges in data cleaning, storage, and processing. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 27, 2019
Average rating: ****.
(4.60, 5 ratings)
In a large global health services company, streaming data for processing and sharing comes with its own challenges. Data science and analytics platforms need data fast, from relevant sources, to act on this data quickly and share the insights with consumers with the same speed and urgency. Join Mohammad Quraishi to learn why streaming data architectures are a necessity—Kafka and Hadoop are key. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 27, 2019
Kurt Brown (Netflix)
Average rating: ****.
(4.22, 9 ratings)
The Netflix data platform is a massive-scale, cloud-only suite of tools and technologies. It includes big data tech (Spark and Flink), enabling services (federated metadata management), and machine learning support. But with power comes complexity. Kurt Brown explains how Netflix is working toward an easier, "self-service" data platform without sacrificing any enabling capabilities. Read more.
Add to your personal schedule
2:40pm3:20pm Wednesday, March 27, 2019
Zhenxiao Luo (Twitter)
Average rating: ****.
(4.09, 11 ratings)
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Zhenxiao Luo explains how Uber supports real-time analytics with deep learning on the fly, without any data copying. Read more.
Add to your personal schedule
5:10pm5:50pm Wednesday, March 27, 2019
Zhenxiao Luo (Twitter)
Average rating: ****.
(4.00, 4 ratings)
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Inside Uber, analysts are using deep learning and big data to train models, make predictions, and run analytics in real time. Zhenxiao Luo explains how Uber runs real-time analytics with deep learning. Read more.
Add to your personal schedule
5:10pm5:50pm Wednesday, March 27, 2019
Rakesh Kumar (Lyft), Thomas Weise (Lyft)
Average rating: ****.
(4.00, 3 ratings)
Rakesh Kumar and Thomas Weise explore how Lyft dynamically prices its rides with a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability, and scalability—allowing the pricing system to be more adaptable to real-world changes. Read more.
Add to your personal schedule
5:10pm5:50pm Wednesday, March 27, 2019
Kevin Moore (Salesforce)
Average rating: ****.
(4.50, 2 ratings)
Kevin Moore walks you through how TransmogrifAI—Salesforce's open source AutoML library built on Spark—automatically generates models that are automatically customized to a company's dataset and use case and provides insights into why the model is making the predictions it does. Read more.
Add to your personal schedule
9:45am9:55am Thursday, March 28, 2019
Location: Ballroom
Theresa Johnson (Airbnb)
Average rating: ****.
(4.22, 18 ratings)
Airbnb uses AI and machine learning in many parts of its user-facing business. But it's also advancing the state of AI-powered internal tools. Theresa Johnson details the AI powering Airbnb's next-generation end-to-end metrics forecasting platform, which leverages machine learning, Bayesian inference, TensorFlow, Hadoop, and web technology. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Yue Li (MemVerge), Shouwei Chen (Rutgers University)
Average rating: *****
(5.00, 4 ratings)
JD.com recently designed a brand-new architecture to optimize Spark computing clusters. Yue Li and Shouwei Chen detail the problems the team faced when building it and explain how the company benefits from the in-memory distributed filesystem now. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Sijie Guo (StreamNative), Penghui Li (Zhaopin)
Average rating: ****.
(4.00, 1 rating)
Using a messaging system to build an event bus is very common. However, certain use cases demand a messaging system with a certain set of features. Sijie Guo and Penghui Li discuss the event bus requirements for Zhaopin.com, one of China's biggest online recruitment services providers, and explain why the company chose Apache Pulsar. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Subhadra Tatavarti (PayPal), Chen Kovacs (Paypal)
Average rating: ****.
(4.12, 8 ratings)
The PayPal data ecosystem is large, with 250+ PB of data transacting in 200+ countries. Given this massive scale and complexity, discovering and access to the right datasets in a frictionless environment is a challenge. Subhadra Tatavarti and Chen Kovacs explain how PayPal’s data platform team is helping solve this problem with a combination of self-service integrated and interoperable products. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
Jason Wang (Cloudera), Sushant Rao (Cloudera)
Average rating: ****.
(4.00, 2 ratings)
Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
Average rating: ****.
(4.75, 4 ratings)
Juan Paulo Gutierrez explains how a small team in Tokyo went through several evolutions as they built an analytics service to help 200+ businesses accelerate their decision-making process. Join in to hear about the background, challenges, architecture, success stories, and best practices as they built and productionalized Rakuten Analytics. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
Francesco Mucio (francescomuc.io)
Average rating: ****.
(4.00, 2 ratings)
Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
Vivek Pasari (Netflix), Jitender Aswani (Netflix)
Average rating: ***..
(3.14, 7 ratings)
Netflix has over 125 million members spread across 191 countries. Each day its members interact with its client applications on 250 million+ devices under highly variable network conditions. These interactions result in over 200 billion daily data points. Vivek Pasari dives into the data engineering and architecture that enables application performance measurement at this scale. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
Avner Braverman (Binaris)
Average rating: ****.
(4.00, 3 ratings)
What is serverless, and how can it be utilized for data analysis and AI? Avner Braverman outlines the benefits and limitations of serverless with respect to data transformation (ETL), AI inference and training, and real-time streaming. This is a technical talk, so expect demos and code. Read more.
Add to your personal schedule
2:40pm3:20pm Thursday, March 28, 2019
Rohan Dhupelia (Atlassian), Jimmy Li (Atlassian)
Average rating: ****.
(4.67, 3 ratings)
Analytics is easy, but good analytics is hard. Atlassian knows this all too well. Rohan Dhupelia and Jimmy Li explain how the company's push to become truly data driven has transformed the way it thinks about behavioral analytics, from how it defined its events to how it ingests and analyzes them. Read more.
Add to your personal schedule
2:40pm3:20pm Thursday, March 28, 2019
John Bennett (Netflix), Siamac Mirzaie (Netflix)
Average rating: ***..
(3.33, 3 ratings)
Data has become a foundational pillar for security teams operating in organizations of all shapes and sizes. This new norm has created a need for platforms that enable engineers to harness data for various security purposes. John Bennett and Siamac Mirzaie offer an overview of Netflix's internal platform for quickly deploying data-based detection capabilities in the corporate environment. Read more.
Add to your personal schedule
3:50pm4:30pm Thursday, March 28, 2019
Adrian Lungu (Adobe), Serban Teodorescu (Adobe)
Average rating: ****.
(4.75, 4 ratings)
Adrian Lungu and Serban Teodorescu explain how—inspired by the green-blue deployment technique—the Adobe Audience Manager team developed an active-passive database migration procedure that allows them to test database clusters in production, minimizing the risks without compromising the innovation. Read more.
Add to your personal schedule
3:50pm4:30pm Thursday, March 28, 2019
Li Gao (Lyft), Bill Graham (Lyft)
Average rating: ****.
(4.00, 2 ratings)
Li Gao and Bill Graham discuss the challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Read more.
Add to your personal schedule
3:50pm4:30pm Thursday, March 28, 2019
Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
Average rating: ****.
(4.00, 1 rating)
Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management. Read more.
Add to your personal schedule
3:50pm4:30pm Thursday, March 28, 2019
Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel)
Average rating: **...
(2.67, 3 ratings)
Yuhao Yang and Jennie Wang demonstrate how to run distributed TensorFlow on Apache Spark with the open source software package Analytics Zoo. Compared to other solutions, Analytics Zoo is built for production environments and encourages more industry users to run deep learning applications with the big data ecosystems. Read more.