Mar 15–18, 2020

Schedule: Data, Analytics, and AI Architecture sessions

Add to your personal schedule
9:00am5:00pm Monday, March 16, 2020
Location: LL20A
Jeffrey Vah (Dell Technologies), Gayathri Rau (Dell Technologies), Shuo Xiang (Robinhood), Grace Lu (Robinhood), Maureen Teyssier (Reonomy), Aaron Williams (OmniSci), Sriram Ravindran (Adobe Inc), Deepak Pai (Adobe), Shubranshu Shekhar (Carnegie Mellon University), Sherin Thomas (Lyft), Dan Gifford (Getty Images), Shondria Lopez-Merlos (Florida Conference of The United Methodist Church), Sandhya Raghavan (Virgin Hyperloop One), Patryk Oleniuk (Virgin Hyperloop One), Ian Beaver (Verint), Aryn Sargent (Verint)
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata Data & AI. Read more.
Add to your personal schedule
1:30pm5:00pm Monday, March 16, 2020
Location: LL21B
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)
Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. You'll get an overview of the inception and growth of the serverless paradigm and explore Apache Pulsar, which provides native serverless support in the form of Pulsar functions. Read more.
Add to your personal schedule
11:00am11:40am Tuesday, March 17, 2020
Location: LL21 F
Sandeep U (Intuit), Giriraj Bagdi (Intuit), Sunil Goplani (Intuit)
Data quality metrics focus on quantifying if data is a mess. But you need to identify lead indicators before data becomes a mess. Sandeep Uttamchandani, Giriraj Bagadi, and Sunil Goplani explore developing lead indicators for data quality for Intuit's production data pipelines. You'll learn about the details of lead indicators, optimization tools, and lessons that moved the needle on data quality. Read more.
Add to your personal schedule
1:45pm2:25pm Tuesday, March 17, 2020
Location: LL20D
Suneeta Mall (Nearmap)
Using Kubernetes as the backbone of AI infrastructure, Nearmap built a fully automated deep learning inference pipeline that's highly resilient, scalable, and massively parallel. Using this system, Nearmap ran semantic segmentation over tens of quadrillions of pixels. Suneeta Mall demonstrates the solution using Kubernetes in big data crunching and machine learning at scale. Read more.
Add to your personal schedule
4:15pm4:55pm Tuesday, March 17, 2020
Location: LL21 C
Lior Gavish (Barracuda)
Lior Gavish breaks down a machine learning (ML)-based system that detects a highly evasive type of email-based fraud. The system combines innovative techniques for labeling and classifying highly unbalanced datasets with a distributed cloud application capable of processing high-volume communication in real time. Read more.
Add to your personal schedule
5:05pm5:45pm Tuesday, March 17, 2020
Location: LL21 F
Balaji Varadarajan (Uber), Vinoth Chandar (Apache Hudi)
Batch processing can benefit immensely from adopting some techniques from the streaming processing world. Balaji Varadarajan shares how Apache Hudi (incubating), an open source project created at Uber and currently incubating with the ASF, can bridge this gap and enable more productive, efficient batch data engineering. Read more.
Add to your personal schedule
5:05pm5:45pm Tuesday, March 17, 2020
Location: LL20A
Ben Galewsky (National Center for Supercomputing Applications), Lindsey Gray (Fermi National Accelerator Laboratory), Andrew Melo (Vanderbilt University)
Building a data engineering pipeline for serving segments of a 200 Pb dataset to particle physicists around the globe poses many challenges, some unique to high energy physics and some to big science projects across disciplines. Ben Galewsky, Gray Lindsey, and Andrew Melo highlight how much of it can inform industry data science at scale. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: LL21 C
Micah Wylde (Lyft)
Lyft processes millions of events per second in real time to compute prices, balance marketplace dynamics, and detect fraud, among many other use cases. Micah Wylde showcases how Lyft uses Kubernetes along with Flink, Beam, and Kafka to enable service engineers and data scientists to easily build real-time data applications. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, March 18, 2020
Location: LL21A
Chendi Xue (Intel), Jian Zhang (Intel), binwei yang (intel)
Chendi Xue and Jian Zhang explore how Intel accelerated Spark SQL with AVX-supported vectorization technology. They outline the design and evaluation, including how to enable columnar process in Spark SQL, how to use Arrow as intermediate data, how to leverage AVX-enabled Gandiva for data processing, and performance analysis with system metrics and breakdown. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, March 18, 2020
Location: LL20A
Zhe Zhang (LinkedIn), Huangming Xie (LinkedIn)
Compute efficiency optimization is of critical importance in the big data era, as data science and ML algorithms become increasingly complex and data size increases exponentially over time. Opportunities exist throughout the resource use funnel, which Zhe Zhang and Huangming Xie characterize using a CLUE framework. Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires