Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

The limits of inference: What data scientists can learn from the reproducibility crisis in science

11:00am11:40am Thursday, March 8, 2018
Average rating: ****.
(4.86, 7 ratings)

Who is this presentation for?

  • Senior data scientists, strategic decision makers, and executives

Prerequisite knowledge

  • A working knowledge of statistics (frequentist or Bayesian)
  • A conceptual understanding of machine learning

What you'll learn

  • Understand why not all problems can be answered using statistical inference and data
  • Learn a framework to differentiate between data-friendly problems and problems data is unlikely to help solve

Description

Researchers in genetics, neuroscience, molecular biology, and psychology all helped pioneer formal statistical inference methodologies to learn from multidimensional datasets. However, today these fields are plagued by an unreliable and uninterpretable body of literature. Experts are calling it a reproducibility crisis. The cause of this chaos is now understood to be widespread misapplication of statistics, particularly when applied at a large scale.

Data science, machine learning, and many big data technologies rely on the same statistical methodology as these scientific fields. As practitioners of burgeoning disciplines, we must ask ourselves some probing questions. Can we learn from the mistakes of the past to better use and scale statistical inference? Can we avoid the same crisis of integrity that now plagues biology, psychology, and neuroscience?

Businesses that understand this problem of scale will beat the competition by being able to better allocate resources and minimize risk. Clare Gollnick considers the philosophy of data science and shares a framework that explains (and even predicts) the likelihood of success of a data project.

Topics include:

  • How so many scientific fields ended up in a reproducibility crisis
  • Why statistical inference fails when applied at scale and how this relates to the infinite monkeys theorem
  • Are new statistical techniques able to avoid this pitfall?
  • Is more data always better? Hume’s problem of induction and how it applies to modern data science
  • How to predict in advance whether it is possible to solve a given problem using data and inference as tools
Photo of Clare Gollnick

Clare Gollnick

NS1

Clare Gollnick is the director of data science at NS1, an industry-leading DNS and traffic management platform. An expert on statistical inference and machine learning, Clare writes and speaks often on the intersection of data, philosophy, and entrepreneurship. Previously, as chief technology officer of Terbium Labs, Clare led a diverse team of engineers and data scientists. Her team built innovate information security products, preventing fraud while still protecting consumer privacy. Clare has published a number of academic papers on information processing within neural networks, validation of new statistical methods, and the philosophy of science. Clare holds a PhD from Georgia Tech and a BS from UC Berkeley.