Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

The limits of inference: What data scientists can learn from the reproducibility crisis in science

Clare Gollnick (NS1)

11:00am–11:40am Thursday, March 8, 2018

Data science and machine learning, Data-driven business management
Location: LL20 A

Average rating:

(4.86, 7 ratings)

Who is this presentation for?

Senior data scientists, strategic decision makers, and executives

Prerequisite knowledge

A working knowledge of statistics (frequentist or Bayesian)
A conceptual understanding of machine learning

What you'll learn

Understand why not all problems can be answered using statistical inference and data
Learn a framework to differentiate between data-friendly problems and problems data is unlikely to help solve

Description

Researchers in genetics, neuroscience, molecular biology, and psychology all helped pioneer formal statistical inference methodologies to learn from multidimensional datasets. However, today these fields are plagued by an unreliable and uninterpretable body of literature. Experts are calling it a reproducibility crisis. The cause of this chaos is now understood to be widespread misapplication of statistics, particularly when applied at a large scale.

Data science, machine learning, and many big data technologies rely on the same statistical methodology as these scientific fields. As practitioners of burgeoning disciplines, we must ask ourselves some probing questions. Can we learn from the mistakes of the past to better use and scale statistical inference? Can we avoid the same crisis of integrity that now plagues biology, psychology, and neuroscience?

Businesses that understand this problem of scale will beat the competition by being able to better allocate resources and minimize risk. Clare Gollnick considers the philosophy of data science and shares a framework that explains (and even predicts) the likelihood of success of a data project.

Topics include:

How so many scientific fields ended up in a reproducibility crisis
Why statistical inference fails when applied at scale and how this relates to the infinite monkeys theorem
Are new statistical techniques able to avoid this pitfall?
Is more data always better? Hume’s problem of induction and how it applies to modern data science
How to predict in advance whether it is possible to solve a given problem using data and inference as tools

Clare Gollnick

NS1

Clare Gollnick is the director of data science at NS1, an industry-leading DNS and traffic management platform. An expert on statistical inference and machine learning, Clare writes and speaks often on the intersection of data, philosophy, and entrepreneurship. Previously, as chief technology officer of Terbium Labs, Clare led a diverse team of engineers and data scientists. Her team built innovate information security products, preventing fraud while still protecting consumer privacy. Clare has published a number of academic papers on information processing within neural networks, validation of new statistical methods, and the philosophy of science. Clare holds a PhD from Georgia Tech and a BS from UC Berkeley.

Website

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com