Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Data science++: Improving data science by adding domain understanding

Matthew Smith (Microsoft Research)
15:30–16:00 Wednesday, 1/06/2016
Hardcore data science
Location: Capital Suite 4 Level: Advanced
Average rating: ***..
(3.50, 2 ratings)

Prerequisite knowledge

Attendees should have a general understanding of data science.


It’s obvious that if we perfectly understood the systems we work with, there would be less need to do data science, but we typically have only a coarse, imperfect understanding of these systems. Less obvious however is what we stand to gain (and what we might lose) by incorporating domain understanding into data science.

Matthew Smith demonstrates how to gain unexpectedly high predictive accuracy, new insights for the domain experts and customers into the functioning of the system, and computationally efficient prediction algorithms, in applications such as predicting crops, global carbon emissions, diseases, ecosystems, species distributions, weather, roads, and riots.

But what we often lose in this process is time. Getting our heads around the science and the appropriate methodologies to use is hard, and these challenges will require new software and software features to better enable the incorporation of scientific understanding into data science applications. Matthew ends by illustrating a few prototype solutions designed to speed up the process of producing valuable results.

Photo of Matthew Smith

Matthew Smith

Microsoft Research

Always attracted to solving real world problems involving complex dynamical systems, Matthew Smith initially trained as an ecologist before undertaking an applied mathematics PhD to up-skill in quantitative techniques before joining the Computational Science Laboratory at Microsoft Research, Cambridge. Matthew has become renowned for completing extremely difficult predictive analytics research, principally using prototype research software. He now applies those skills to solve real-world data science problems.

Comments on this page are now closed.


Picture of Matthew Smith
Matthew Smith
8/03/2016 20:37 GMT

I’ve seen so many examples of generic statistical techniques being applied to analytics problems (mostly done very well) I thought it valuable to cover the use of models that represent some expert knowledge of how the system works in their structure. Sufficiently powerful ML techniques are now the magic glue that enables us to combine such models with data.