Traditionally, the data science community has overlooked causal inference. Often, the ability to predict outcomes—who will purchase what book, who will click which ad, who will vote for a particular candidate—is good enough. But how do we avoid showing ads to people who would have purchased the book anyway? How do we allocate resources to get-out-the-vote campaigns to maximally mobilize people who would not have voted otherwise? These problems fall under the domain of randomized controlled experiments and causal inference, where we need techniques to model the impact of applying a treatment on an outcome of interest. The individual-level predictions that come out of such models tell us how a particular person will respond to a particular ad or intervention and can be used for optimally assigning treatments to individuals.
As a concrete example, imagine we would like to promote a book. We can group our potential audience into three groups: people who will purchase the book regardless of any promotion, those who will purchase the book for a slight discount, and those who won’t purchase the book regardless of discounts or promotions. Approaching this as a traditional machine-learning problem, we might try to build a model predicting promotion redemption or ad clicks, which would have us spending resources on people from both of the first two groups. Ideally, since people in the first group would buy the book anyway, we would like to exclude them from promotional activities. Doing this requires predicting two things: the likelihood a person would buy the book and the likelihood a person would buy the book after exposure to a promotion.
This sort of modeling is variously known as persuasion modeling, uplift modeling, or heterogeneous treatment effects modeling. While there is a rich literature on persuasion modeling in the social sciences and marketing, such techniques are often unknown and underutilized in the machine-learning and data science communities. Likewise, techniques from the machine-learning and data science communities often don’t make their way back to the social science and marketing realms.
Michelangelo D’Agostino and Bill Lattner share their experience developing machine-learning techniques for predicting treatment responsiveness from randomized controlled experiments and explore the use of these Michelangelo and Bill start with a summary of randomized controlled experiments and the persuasion modeling problem, covering both baseline and cutting-edge techniques for building these models, before presenting ways to do evaluation and model selection. Along the way, they’ll discuss several successfully executed case studies from their work at Civis Analytics.
Michelangelo D’Agostino is the vice president of data science and engineering at ShopRunner, where he leads a team that develops statistical models and writes software that leverages their unique cross-retailer ecommerce dataset. Previously, Michelangelo led the data science R&D team at Civis Analytics, a Chicago-based data science software and consulting company that spun out of the 2012 Obama reelection campaign, and was a senior analyst in digital analytics with the 2012 Obama reelection campaign, where he helped to optimize the campaign’s email fundraising juggernaut and analyzed social media data. Michelangelo has been a mentor with the Data Science for Social Good Fellowship. He holds a PhD in particle astrophysics from UC Berkeley and got his start in analytics sifting through neutrino data from the IceCube experiment. Accordingly, he spent two glorious months at the South Pole, where he slept in a tent salvaged from the Korean War and enjoyed the twice-weekly shower rationing. He’s also written about science and technology for the Economist.
Once upon a time, Bill Lattner was a civil engineer. Now, he is a data scientist on the R&D team at Civis Analytics, where he spends most of his time writing tools for other data scientists, primarily in Python but also in R and occasionally Go. Prior to joining Civis, Bill was at Dishable, working on recommender systems and predicting dinning habits of Chicagoans.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.