The dangers of data leakage in production machine learning systems
Who is this presentation for?
- Data scientists, engineers, and product managers
According to published research, data leakage is frequently found in public datasets, and it is likely to be at least as widespread in the private sector, where there’s less transparency.
Data leakage occurs when the model gains access to data that it shouldn’t have access to. AI systems can fail catastrophically in production if leakage is not dealt with properly. Martin Goodson details the main four manifestations of data leakage and explains how to recognize the warning signs. By mastering several key scientific principles, you can mitigate the risk of failure.
- Familiarity with supervised learning, classification, precision, recall, accuracy, cross-validation, and train and test split
What you'll learn
- Learn the errors that data leakage causes and how to build systems that protect against common manifestations of data leakage
Martin Goodson is the chief scientist and CEO of Evolution AI, where he specializes in large-scale natural language processing. Martin has designed data science products that are in use at companies like Dun & Bradstreet, Time Inc., John Lewis, and Condé Nast. Previously, Martin was a statistician at the University of Oxford, where he conducted research on statistical matching problems for DNA sequences. He runs the largest community of machine learning practitioners in Europe, Machine Learning London, and convenes the CBI/Royal Statistical Society roundtable, AI in Financial Services. Martin’s work has been covered by publications such as the Economist, Quartz, Business Insider, TechCrunch, and others.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires