Skip to main content

Machine Learning Gremlins

Ben Hamner (Kaggle)
Average rating: ****.
(4.71, 14 ratings)

At Kaggle, we run machine learning projects internally and also crowdsources some projects through open competitions. We’ll cover the gritty details of the most fascinating competitions we’ve hosted to date, from optimizing early stage drug discovery pipelines to algorithmically scoring student-written essays, and explore the methods that won these problems.

After working on hundreds of machine learning projects, we’ve seen many common mistakes that can derail projects and endanger their success. These include:

- Data leakage - Overfitting - Poor data quality - Solving the wrong problem - Sampling errors - and many more

In this talk, we will go through the machine learning gremlins in detail, and learn to identify their many disguises. After this talk, you will be prepared to identify the machine learning gremlins in your own work and prevent them from killing a successful project.

Photo of Ben Hamner

Ben Hamner

Data Scientist, Kaggle

Ben Hamner is the Director of Engineering Kaggle. He has worked with machine learning problems in a variety of different domains, including natural language processing, computer vision, web classification, and neuroscience. Prior to joining Kaggle, he applied machine learning to improve brain-computer interfaces as a Whitaker Fellow at the École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland. He graduated with a BSE in Biomedical Engineering, Electrical Engineering, and Math from Duke University.