Office Hour with Claudia Perlich

Claudia Perlich (Dstillery)
Office Hour, Exhibit Hall (Office Hours B)
  • Causal analysis in observational data: Rather than spending effort and money on A/B tests, we show how to use predictive modeling to derive reliable measure on the actual effect that an ad has.
  • Inventory Optimization: How you can use predictive modeling (logistic regression) to evaluate a particular inventory with respect to its impact on the conversion probability of customers. More generally we are looking at counter-factual modeling dealing with biases in data samples for predictive modeling: So you can only get enough data when you look under the streetlight. But you need to illuminate the dark: Building staged predictive models can combine the predictive information from the street light with the ‘correct’ adjustments to work in the data.
  • Finding groups of url’s: clustering website without crawling or parsing the content
  • Lessons from running large scale semi-automated machine learning system in production

And more generally beyond the advertising applications:

  • On the dangers of leakage and why your data scientist has to pull his/ her own data
  • Ranking and probability estimation in Millions of dimensions
Photo of Claudia Perlich

Claudia Perlich


Prior to joining Dstillery (former Media6Degrees), Claudia Perlich spent five years working at the Data Analytics Research group at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She has been published in over 30 scientific publications and holds multiple patents in the area of machine learning. Claudia has won many data mining competitions, including the prestigious 2007 KDD CUP on movie ratings, the 2008 KDD CUP on breast-cancer detection, and the 2009 KDD CUP on churn and propensity predictions for telecommunication customers. Claudia received her Ph.D. in Information Systems from Stern School of Business, New York University in 2005, and holds a Master of Computer Science from Colorado University.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts