The quest for high-quality data
“AI starts with good data” is a statement that receives wide agreement from data scientists, analysts, and business owners. There has been a significant increase in our ability to build complex AI models for prediction, classification, and various analytics tasks, and there’s an abundance of (fairly easy to use) tools that allow data scientists and analysts to provision complex models within days. However, the lack of data or data-quality issues remains the main bottleneck holding back further adoption of AI technologies. Even with advances in building robust models, the reality is that noisy data and incomplete data remain the biggest hurdles to effective end-to-end solutions. Multiple studies prove that cleaning data is a much more effective investment than enhancing learning robustness.
Ihab Ilyas highlights this data quality problem and describes the HoloClean framework, a state-of-the-art prediction engine for structured data with direct applications in detecting and repairing data errors, as well as imputing missing labels and values. The framework uses techniques such as data augmentation and self-supervised learning to build models that describe how data is generated and how errors and anomalies are introduced.
University of Waterloo
Ihab Ilyas is a professor in the Cheriton School of Computer Science at the University of Waterloo, where his research focuses on the areas of big data and database systems, with special interest in data quality and integration, managing uncertain data, rank-aware query processing, and information extraction. Ihab is also a cofounder of Tamr, a startup focusing on large-scale data integration and cleaning. He’s a recipient of the Ontario Early Researcher Award (2009), a Cheriton faculty fellowship (2013), an NSERC Discovery Accelerator Award (2014), and a Google Faculty Award (2014), and he’s an ACM Distinguished Scientist. Ihab is an elected member of the VLDB Endowment board of trustees and an associate editor of ACM Transactions of Database Systems (TODS). He holds a PhD in computer science from Purdue University, West Lafayette.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires