At Kaggle, we run machine learning projects internally and also crowdsources some projects through open competitions. We’ll cover the gritty details of the most fascinating competitions we’ve hosted to date, from optimizing early stage drug discovery pipelines to algorithmically scoring student-written essays, and explore the methods that won these problems.
After working on hundreds of machine learning projects, we’ve seen many common mistakes that can derail projects and endanger their success. These include:
- Data leakage - Overfitting - Poor data quality - Solving the wrong problem - Sampling errors - and many moreIn this talk, we will go through the machine learning gremlins in detail, and learn to identify their many disguises. After this talk, you will be prepared to identify the machine learning gremlins in your own work and prevent them from killing a successful project.
Ben Hamner is the Director of Engineering Kaggle. He has worked with machine learning problems in a variety of different domains, including natural language processing, computer vision, web classification, and neuroscience. Prior to joining Kaggle, he applied machine learning to improve brain-computer interfaces as a Whitaker Fellow at the École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland. He graduated with a BSE in Biomedical Engineering, Electrical Engineering, and Math from Duke University.
For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
View a complete list of Strata contacts