Over the past five years or so, many scientific and research disciplines have experienced a reproducibility crisis. Just a few examples of the evidence that has accumulated:
In 2012, Begley and Ellis reported that when Amgen tried to replicate 53 “landmark” findings from hematology and oncology, they could successfully replicate a mere six studies. Similarly, scientists from Bayer reported that out of 67 key attempts to replicate academic research, only about one-third of the time were results replicable enough to be worth further investment.
In 2015, Science published the results of the largest replication project ever performed: the Reproducibility Project in Psychology, in which hundreds of researchers around the world attempted to replicate 100 psychology experiments that had been published in three top psychology journals in recent years. Only about 40% of the findings could be successfully replicated, while the rest were either inconclusive or definitively not replicated.
The Reproducibility Project in Cancer Biology set out to replicate the top 50 cancer biology experiments published from 2010 to 2012. Its results so far have been mixed, and most recently, the project had to be scaled back to a mere 18 experiments mostly because it proved to be expensive, time consuming, and difficult to chase down all of the details of the original experiments.
And in August 2018, the Social Sciences Replication Project replicated all 21 social science experiments that had been published in Science or Nature from 2010 to 2015. Only 13 of the 21 experiments could be replicated, and even then, the effect size was typically about half of what had originally been published.
As practiced in many companies, data science and experimentation can suffer from the same flaws that have created reproducibility problems in everything from medicine to psychology. Indeed, advice from the Harvard Business Review (and elsewhere) can directly lead to inaccurate analyses. (For example, this article recommends that people “slice the data,” which is completely contrary to good practice.)
Stuart Buck identifies the most significant sources of problematic data analysis and details the top solutions that other disciplines have used to improve rigor and reproducibility. Business executives and data scientists who prepare to avoid the reproducibility problem will be able to gather better data, draw more informed conclusions, and ultimately make better decisions that improve their strategic positioning in the market.
Stuart Buck is the vice president of research at Arnold Ventures, one of the leading funders of research to inform public policy. He has given advice to DARPA, IARPA (the CIA’s research arm), and the White House Social and Behavioral Sciences Team on rigorous research processes. He has sponsored major efforts showing that even the best scientific research is often irreproducible; this work has been featured in Wired, the Economist, the New York Times, and the Atlantic. He has also published in top journals (such as Science and BMJ) on how to make research more accurate. He holds a PhD in education policy from the University of Arkansas, where he studied econometrics, statistics, and program evaluation; a JD with honors from Harvard Law School, where he was an editor of the Harvard Law Review; and bachelor’s and master’s degrees in music performance from the University of Georgia.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org