It’s obvious that if we perfectly understood the systems we work with, there would be less need to do data science, but we typically have only a coarse, imperfect understanding of these systems. Less obvious however is what we stand to gain (and what we might lose) by incorporating domain understanding into data science.
Matthew Smith demonstrates how to gain unexpectedly high predictive accuracy, new insights for the domain experts and customers into the functioning of the system, and computationally efficient prediction algorithms, in applications such as predicting crops, global carbon emissions, diseases, ecosystems, species distributions, weather, roads, and riots.
But what we often lose in this process is time. Getting our heads around the science and the appropriate methodologies to use is hard, and these challenges will require new software and software features to better enable the incorporation of scientific understanding into data science applications. Matthew ends by illustrating a few prototype solutions designed to speed up the process of producing valuable results.
Always attracted to solving real world problems involving complex dynamical systems, Matthew Smith initially trained as an ecologist before undertaking an applied mathematics PhD to up-skill in quantitative techniques before joining the Computational Science Laboratory at Microsoft Research, Cambridge. Matthew has become renowned for completing extremely difficult predictive analytics research, principally using prototype research software. He now applies those skills to solve real-world data science problems.
Comments on this page are now closed.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.