Sep 23–26, 2019

Schedule: Data quality, data governance and data lineage sessions

Much of ML in use within companies falls under supervised learning, which means proper training data (or labeled examples) are essential. The rise of deep learning has made this even more pronounced, as many modern neural network architectures rely on large amounts of training data. Issues pertaining to data security, privacy and governance persist and are not necessarily unique to ML applications. But the hunger for large amounts of training data, the advent of new regulations like GDPR, and the importance of managing risk means a stronger emphasis on reproducibility and data lineage are very much needed.

Add to your personal schedule
9:00am5:00pm Tuesday, September 24, 2019
Location: 1A 08
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Moto Tohda (Tokyo Century (USA) Inc.), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 23/24
Wim Stoop (Cloudera)
Establishing enterprise wide security and governance remains a challenge for most organisations. Integrations and exchanges across their landscape are costly to manage and maintain, and typically work in one direction only. In this session, we'll discuss how ODPi's Egeria standard and framework removes the challenges and is leveraged by Cloudera and partners alike to deliver value for customers. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 23/24
Shirshanka Das (LinkedIn), Mars Lan (LinkedIn)
How do you scale metadata to an organization of 10,000 employees, 1M+ data assets and an AI-enabled company that ships code to the site three times a day. We describe the journey of LinkedIn’s metadata from a two-person back-office team to a central hub powering data discovery, AI productivity and automatic data privacy. Different metadata strategies and our battle scars will be revealed! Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1E 07/08
Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)
Imagine a business insight showing a sudden spike.Debugging data pipelines is non-trivial and finding the root cause can take hours or even days! We’ll share how Intuit built a self-serve tool that automatically discovers data pipeline lineage and tracks every change that impacts pipeline.This helps debug pipeline issues in minutes–establishing trust in data while improving developer productivity. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 23/24
Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber)
At Uber’s scale and pace of growth, a robust system for discovering and managing various entities, from datasets to services to pipelines, and their relevant metadata is not just nice to have: it is absolutely integral to making data useful at Uber. In this talk, we will explore the current state of metadata management and end-to-end data flow solutions at Uber and what’s coming next. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 23/24
Max Neunhöffer (ArangoDB), Joerg Schad (Suki)
Machine Learning Platforms being built are becoming more complex with different components each producing their own metadata. Currently, most components provide their own way of storing metadata. In this talk, we propose a first draft of a common Metadata API and demo a first implementation of this API in Kubeflow using ArangoDB, which is a native multi-model database. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1E 10/11
Andrew Brust (ZDNet | Blue Badge Insights)
A primer on data catalogs and review of the major vendors and platforms in the market. Includes discussion on the use of data catalogs with classic and newer data repositories, including data warehouses, data lakes, cloud object storage and even software/applications. Coverage of AI's role in the data catalog world and analysis of data catalog futures will be provided. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 23/24
Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what data sets are available for consumption. To address this challenge, a custom metadata management tool was recently deployed as a new capability at Bayer. The system is cloud enabled and uses multiple open source components including machine learning and natural language processing to aid search. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1E 14
Brindaalakshmi K (Independent Consultant)
There is a lack of standard for the collection of gender data. This session takes a look at the implications of such a lack in the context of a developing country like India, the exclusion of individuals beyond the binary genders of male and female and how this exclusion permeates beyond the public sector into private sector services. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1E 07/08
Nikki Rouda (Amazon Web Services), Roy Hasson (Amazon Web Services)
Learn how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. Link customer records across different databases (e.g. different name spelling or address.) Match external product lists against your own catalog, such as lists of hazardous goods. Solve tough challenges to prepare and cleanse data for analysis. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 12/14
Andrew Leamon (Comcast), Wadkar Sameer (Comcast NBCUniversal)
And overview of the Data Management and privacy challenges around automating ML model (re)deployments and stream based inferencing at scale. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 26, 2019
Location: 1A 15/16
Neelesh Salian (Stitch Fix)
It is important to understand why Data Lineage is needed for an organization. Once the purpose is defined, we can talk about how to go about building such a system. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    Contact list

    View a complete list of Strata Data Conference contacts