Mar 15–18, 2020

Schedule: Governance sessions

Add to your personal schedule
11:00am11:40am Tuesday, March 17, 2020
Location: LL20C
Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)
Production deployment of machine learning (ML) models requires data governance, because models are data. Dean Wampler and Boris Lublinsky justify that claim and explore its implications and techniques for satisfying the requirements. Using motivating examples, you'll explore reproducibility, security, traceability, and auditing, plus some unique characteristics of models in production settings. Read more.
Add to your personal schedule
11:50am12:30pm Tuesday, March 17, 2020
Location: LL20C
Secondary topics:  Data Quality
Shradha Ambekar (Intuit), Sunil Goplani (Intuit)
Debugging data pipelines is nontrivial and finding the root cause can take hours to days. Shradha Ambekar and Sunil Goplani outline how Intuit built a self-serve tool that automatically discovers data pipeline lineage and applies anomaly detection to detect and help debug issues in minutes—establishing trust in metrics and improving developer productivity by 10x–100x. Read more.
Add to your personal schedule
1:45pm2:25pm Tuesday, March 17, 2020
Location: LL20C
Secondary topics:  Security and Privacy
Haopei Wang (DataVisor)
Haopei Wang details the design and implementation of a system that automatically extracts fraud-related features for digital identifiers commonly collected by online services. You'll be able to address real-time feature computation and create templates for feature generations. The system has been applied successfully to fraud detection and good user analysis. Read more.
Add to your personal schedule
2:35pm3:15pm Tuesday, March 17, 2020
Location: LL20C
Secondary topics:  Security and Privacy
AMANDA CHESSELL (IBM), John Mertic (Linux Foundation)
Building on its success at establishing standards in the Apache Hadoop data platform, the ODPi (Linux Foundation) turns its focus to the next big data challenge—enabling metadata management and governance at scale across the enterprise. Mandy Chessell and John Mertic discuss how the ODPi's guidance on governance (GoG) aims to create an open data governance ecosystem. Read more.
Add to your personal schedule
4:15pm4:55pm Tuesday, March 17, 2020
Location: LL20C
Sihui Hu (Microsoft), Dom Divakaruni (Microsoft)
Data scientists need a way to ensure result reproducibility. Sihui "May" Hu and Dominic Divakaruni unpack how to retrieve data-to-data, data-to-model, and model-to-deployment lineages in one graph to achieve reproducible and reliable machine learning at scale. You'll discover effective ways to track the full lineage from data preparation to model training to inference. Read more.
Add to your personal schedule
5:05pm5:45pm Tuesday, March 17, 2020
Location: LL20C
Lars George (Okera)
With various levels of security layers and different departments responsible for data, there are a number of challenges with managing security and governance within AWS identity and access management (IAM). Lars George identifies the security layers, why there’s such a conundrum with IAM, if IAM actually slows down data projects, and the access control requirements needed in data lakes. Read more.
Add to your personal schedule
11:00am12:30pm Wednesday, March 18, 2020
Location: 210 C/G
Willy Lulciuc (WeWork)
Willy Lulciuc explains how lineage metadata in conjunction with a data catalog helps improve the overall quality of data. You'll dive into complex inter-DAGs dependencies in Airflow and get a hands-on introduction to data lineage using Marquez. You'll also develop strong debugging techniques and learn how to effectively apply them. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, March 18, 2020
Location: LL20C
Jin Hyuk Chang (Lyft), Tao Feng (Lyft)
Jin Hyuk Chang and Tao Feng offer a glimpse of Amundsen, an open source data discovery and metadata platform from Lyft. Since it was open-sourced, Amundsen has been used and extended by many different companies within the community. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: LL20C
Secondary topics:  Security and Privacy
Nong Li (Okera)
The evolution of storing data in a warehouse to hybrid infrastructure of on-premises and cloud data lakes enabled agility and scale. Nong Li looks at the problems between data and metadata, the privacy and security risks associated with them, how to avoid the pitfalls of this challenges, and why companies need to get it right by enforcing security and privacy consistently across all applications. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, March 18, 2020
Location: LL20C
Secondary topics:  Security and Privacy
Lisa Joy Rosner (Otonomo)
As cars gain more advanced features, the role of customer privacy and responsible data stewardship becomes an important focus for auto manufacturers and drivers. Lisa Joy Rosner discusses the future of connected vehicles, data compliance measures, and the impact of related policies like GDPR and the California Consumer Privacy Act (CCPA). Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires