We’re told data science is the key to unlocking the value in big data, but nobody seems to agree just what it is. Is it engineering, statistics. . .both? David Donoho’s “50 Years of Data Science,” which is itself a survey of Tukey’s “Future of Data Analysis,” offers one of the best criticisms of the hype around data science from a statistics perspective, arguing that data science is not new (if it’s anything at all) and calling statistics to action (again) to take back the field with a more practical, modern view of what it means to teach statistics and data science.
Drawing on his blog post, Sean Owen responds, offering counterpoints from an engineer, in search of a better understanding of how to teach and practice data science in 2017. Sean explores some key points in the history of data science from the past 50 years in order to build up a more complete view of how data science sprung out of statistics and merged with computer engineering and concludes by comparing Donoho’s view of what it means to build data science capability with one taken from the experience organizations doing so in the context of Apache Hadoop, Spark, and other big data tools.
Sean Owen is director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. He is an Apache Spark committer, was a committer and VP for Apache Mahout, and is the coauthor of Advanced Analytics on Spark and Mahout in Action. Previously, Sean was a senior engineer at Google.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org