From Knowing "What" To Understanding "Why"

Claudia Perlich (Dstillery)
Deep Data, A-B
Average rating: ****.
(4.00, 1 rating)

Companies are collecting data at amazing rates. Yet no matter what industry we are in, we remain largely at a loss as to why people do what they do. And so Ruth Stanat’s “Drowning in data, yet starved of information” is still very relevant, except that it is not as much information that we are starving for, but the understanding how to use information to pick the right action and to influence people such that they behave in a way that is better for them, better for us, and possibly better for society in general.

The time has come to utilize big data to move from the assessment of correlation to causation. Many industries let inertia guide both the methods they use to gain business insights and the metrics they believe measure the success of their business decisions. But ultimately, every (hopefully data-driven) business decision boils down to some action. And those actions affect our customers, our employees, our clients, our competitors, and our society.

Establishing metrics that quantify the impact of these business decisions and measuring them is the challenge we are coming to face. And while this statement seems obvious; an examination of multibillion dollar industries reveals that in many cases companies are not choosing metrics that truly measure the degree to which business decisions are successful, let alone make optimal decisions in the data-driven anticipation of the expected impact.
And this is where methods for causal analysis on observational data can lead the way.

Causal methods allow us to measure the impact of our business decisions and even explore potential business decisions. The theory has been around for a while, and we now have in many cases the data to answer the real questions. In our case: What exactly is the impact of showing this ad to this person at this time? And is this really where you should be spending your marketing dollars? An application of causal methods to the online display advertising industry reveals that the current metrics for evaluating advertising campaign success tell a very different story about the value of our advertising choices than the story told when we actually evaluate causal impact.

Of course, if good data analysis is already fraught with pitfalls for prediction, wait for the challenges trying to get causal analysis right. More so than ever, careful data preparation is the foundation of reliable answers. Missing one important variable can lead to entirely spurious and misleading results. It did not work for us the first time either but ultimately it is well worth the effort.

Photo of Claudia Perlich

Claudia Perlich


Chief Scientist for M6D PhD. Information systems from NYU. Spent last 5 years as Sr Researcher at IBM research. Winner of 2007,2008 and 2009 KDD cups.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts