Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

A deep dive into DeepDive

Mike Cafarella (University of Michigan)
4:30pm–5:00pm Tuesday, 03/29/2016
Hardcore Data Science
Location: 210 C/G
Tags: ai
Average rating: ****.
(4.60, 10 ratings)

Prerequisite knowledge

Attendees should have a general understanding of machine-learning practices.


DeepDive is a trained system that uses machine learning to cope with various forms of noise and imprecision. DeepDive is designed to make it easy for users who do not have machine-learning expertise to train the system through low-level feedback via the MindTagger interface and discover rich, structured domain knowledge via rules. Mike Cafarella offers an introduction to DeepDive, exploring the key technical innovations that enable DeepDive to produce statistical inference at massive scale.

Photo of Mike Cafarella

Mike Cafarella

University of Michigan

Mike Cafarella is one of the cofounders of the Apache Hadoop and Nutch open source projects. Mike is also an assistant professor of computer science and engineering at the University of Michigan. His research interests include databases, information extraction, data integration, and data mining. Recently, he cofounded Lattice Data (, a company that aims to transform “dark data,” such as unstructured text documents and reports, into high quality structured databases.