Researchers in genetics, neuroscience, molecular biology, and psychology all helped pioneer formal statistical inference methodologies to learn from multidimensional datasets. However, today these fields are plagued by an unreliable and uninterpretable body of literature. Experts are calling it a reproducibility crisis. The cause of this chaos is now understood to be widespread misapplication of statistics, particularly when applied at a large scale.
Data science, machine learning, and many big data technologies rely on the same statistical methodology as these scientific fields. As practitioners of burgeoning disciplines, we must ask ourselves some probing questions. Can we learn from the mistakes of the past to better use and scale statistical inference? Can we avoid the same crisis of integrity that now plagues biology, psychology, and neuroscience?
Businesses that understand this problem of scale will beat the competition by being able to better allocate resources and minimize risk. Clare Gollnick considers the philosophy of data science and shares a framework that explains (and even predicts) the likelihood of success of a data project.
Clare Gollnick is the director of data science at NS1 based out of New York City.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com