Sep 23–26, 2019

Schedule: Data quality, data governance and data lineage sessions

Much of ML in use within companies falls under supervised learning, which means proper training data (or labeled examples) are essential. The rise of deep learning has made this even more pronounced, as many modern neural network architectures rely on large amounts of training data. Issues pertaining to data security, privacy and governance persist and are not necessarily unique to ML applications. But the hunger for large amounts of training data, the advent of new regulations like GDPR, and the importance of managing risk means a stronger emphasis on reproducibility and data lineage are very much needed.

Add to your personal schedule
9:00am5:00pm Tuesday, September 24, 2019
Location: 1A 08
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Brian Lynch (TD Bank Group), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Karan Jaswal (Cinchy), Moto Tohda (Tokyo Century (USA)), Viridiana Lourdes (Ayasdi), Peter Swartz (Altana Trade), Mikheil Nadareishvili (TBC Bank)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 23/24
Wim Stoop (Cloudera), Srikanth Venkat (Cloudera)
Establishing enterprise-wide security and governance remains a challenge for most organizations. Integrations and exchanges across the landscape are costly to manage and maintain, and typically work in one direction only. Wim Stoop and Srikanth Venkat explore how ODPi's Egeria standard and framework removes the challenges and is leveraged by Cloudera and partners alike to deliver value. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 23/24
Shirshanka Das (LinkedIn), Mars Lan (LinkedIn)
Imagine scaling metadata to an organization of 10,000 employees, 1M+ data assets, and an AI-enabled company that ships code to the site three times a day. Shirshanka Das and Mars Lan dive into LinkedIn’s metadata journey from a two-person back-office team to a central hub powering data discovery, AI productivity, and automatic data privacy. They reveal metadata strategies and the battle scars. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1E 07/08
Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)
A business insight shows a sudden spike. It can take hours, or days, to debug data pipelines to find the root cause. Shradha Ambekar, Sunil Goplani, and Sandeep Uttamchandani outline how Intuit built a self-service tool that automatically discovers data pipeline lineage and tracks every change, helping debug the issues in minutes—establishing trust in data while improving developer productivity. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 23/24
Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber)
Uber takes data driven to the next level. It needs a robust system for discovering and managing various entities, from datasets to services to pipelines, and their relevant metadata isn't just nice—it's absolutely integral to making data useful. Kaan Onuk, Luyao Li, and Atul Gupte explore the current state of metadata management, end-to-end data flow solutions at Uber, and what’s coming next. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 23/24
Max Neunhöffer (ArangoDB), Joerg Schad (ArangoDB)
Machine learning platforms are becoming more complex, with different components each producing their own metadata and their own way of storing metadata. Max Neunhöffer and Joerg Schad propose a first draft of a common metadata API and demonstrate a first implementation of this API in Kubeflow using ArangoDB, a native multimodel database. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1E 10/11
Andrew Brust (Blue Badge Insights | ZDNet)
Andrew Brust provides a primer on data catalogs and a review of the major vendors and platforms in the market. He examines the use of data catalogs with classic and newer data repositories, including data warehouses, data lakes, cloud object storage, and even software and applications. You'll learn about AI's role in the data catalog world and get an analysis of data catalog futures. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 23/24
Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what datasets are available for consumption. Naghman Waheed and John Cooper outline a custom metadata management tool recently deployed at Bayer. The system is cloud-enabled and uses multiple open source components, including machine learning and natural language processing to aid searches. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1E 14
Brindaalakshmi K (Independent Consultant)
There's a lack of standard for the collection of gender data. Brindaalakshmi K takes a look at the implications of such a lack in the context of a developing country like India, the exclusion of individuals beyond the binary genders of male and female, and how this exclusion permeates beyond the public sector into private sector services. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 03
Neelesh Salian (Stitch Fix)
Every data team has to build an ecosystem that sustains the data, the users, and the use of the data itself. This data ecosystem comes with its own challenges during the building phase, maintenance, and enhancement. Neelesh Salian dives into the importance of data lineage for an organization. You'll explore how to go about building such a system. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1E 07/08
Nikki Rouda (Amazon Web Services), Janisha Anand (Amazon Web Services)
Nikki Rouda and Janisha Anand demonstrate how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. You'll also learn how to link customer records across different databases, match external product lists against your own catalog, and solve tough challenges to prepare and cleanse data for analysis. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 12/14
Mumin Ransom (Comcast), Nick Pinckernell (Comcast)
Mumin Ransom gives an overview of the data management and privacy challenges around automating ML model (re)deployments and stream-based inferencing at scale. Read more.
  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires