Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

What "50 Years of Data Science" leaves out

srowen om (Cloudera)
16:3517:15 Wednesday, 24 May 2017
Data science and advanced analytics
Location: Hall S21/23 (B)
Level: Beginner
Average rating: ***..
(3.80, 5 ratings)

Who is this presentation for?

  • Data scientists, statisticians, analysts, and those building data science capability

What you'll learn

  • Understand the history of statistics leading up to modern data science and why data science can be considered a movement
  • Gain a more balanced view of what building data science capability means today


We’re told data science is the key to unlocking the value in big data, but nobody seems to agree just what it is. Is it engineering, statistics. . .both? David Donoho’s “50 Years of Data Science,” which is itself a survey of Tukey’s “Future of Data Analysis,” offers one of the best criticisms of the hype around data science from a statistics perspective, arguing that data science is not new (if it’s anything at all) and calling statistics to action (again) to take back the field with a more practical, modern view of what it means to teach statistics and data science.

Drawing on his blog post, Sean Owen responds, offering counterpoints from an engineer, in search of a better understanding of how to teach and practice data science in 2017. Sean explores some key points in the history of data science from the past 50 years in order to build up a more complete view of how data science sprung out of statistics and merged with computer engineering and concludes by comparing Donoho’s view of what it means to build data science capability with one taken from the experience organizations doing so in the context of Apache Hadoop, Spark, and other big data tools.

Photo of srowen om

srowen om


Sean Owen is director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. He is an Apache Spark committer, was a committer and VP for Apache Mahout, and is the coauthor of Advanced Analytics on Spark and Mahout in Action. Previously, Sean was a senior engineer at Google.