The dangers of data leakage in production machine learning systems





Who is this presentation for?
- Data scientists, engineers, and product managers
Level
IntermediateDescription
According to published research, data leakage is frequently found in public datasets, and it is likely to be at least as widespread in the private sector, where there’s less transparency.
Data leakage occurs when the model gains access to data that it shouldn’t have access to. AI systems can fail catastrophically in production if leakage is not dealt with properly. Martin Goodson details the main four manifestations of data leakage and explains how to recognize the warning signs. By mastering several key scientific principles, you can mitigate the risk of failure.
Prerequisite knowledge
- Familiarity with supervised learning, classification, precision, recall, accuracy, cross-validation, and train and test split
What you'll learn
- Learn the errors that data leakage causes and how to build systems that protect against common manifestations of data leakage

Martin Goodson
Evolution AI
Martin Goodson is the chief scientist and CEO of Evolution AI, where he specializes in large-scale natural language processing. Martin has designed data science products that are in use at companies like Dun & Bradstreet, Time Inc., John Lewis, and Condé Nast. Previously, Martin was a statistician at the University of Oxford, where he conducted research on statistical matching problems for DNA sequences. He runs the largest community of machine learning practitioners in Europe, Machine Learning London, and convenes the CBI/Royal Statistical Society roundtable, AI in Financial Services. Martin’s work has been covered by publications such as the Economist, Quartz, Business Insider, TechCrunch, and others.
Presented by
Elite Sponsors
Strategic Sponsor
Exabyte Sponsor
Impact Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
aisponsorships@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires