Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

BayesDB: Query the probable implications of your data

2:40pm–3:20pm Wednesday, 03/30/2016
Tags: featured
Average rating: ****.
(4.78, 9 ratings)

Prerequisite knowledge

Attendees should be familiar with SQL and data manipulation.


Statistics and machine learning are part of many new systems but have been described as the “high-interest credit card of technical debt.” Is it possible to make statistical inference broadly accessible to nonstatistician programmers without sacrificing mathematical rigor or inference quality? BayesDB is a system that enables users to query the probable implications of their data as directly as SQL enables them to query the data itself.

BayesDB abstracts away model-specific details of statistical inference by providing a declarative query language, the Bayesian Query Language (BQL). BQL extends SQL and provides queries for Bayesian data analysis that are answered by averaging over an implicit space of probabilistic models. BayesDB provides a machine-assisted modeling language, MML, that enables domain experts to specify qualitative constraints and produces baseline models that are suitable for data cleaning, anomaly detection, variable selection, and a broad class of prediction tasks. BayesDB also provides an extensible interface for plugging in statistical and algorithmic models and composing multiple kinds of models into a single population.

BayesDB’s domain-general metamodel and extensible architecture have been successfully applied to understand financial services fraud, microbiome data, clinical data from a study of diabetes, earth satellites, and astronomy. Richard Tibbetts and Vikash Mansinghka explore the applications of BayesDB for analyzing and understanding developmental economics data in collaboration with the Gates Foundation.

Photo of Richard Tibbetts

Richard Tibbetts


Richard Tibbetts is currently a Principal Product Manager at Tableau. He was founder and CEO of Empirical Systems (acquired by Tableau 2018), a MIT spinout building an AI-based data platform that provided decision support to organizations that use structured data. Prior to Empirical, he was founder and CTO of StreamBase, a CEP company (acquired by TIBCO 2013), as well as a visiting scientist at the Probabilistic Computing Project at MIT.

Photo of Vikash Mansinghka

Vikash Mansinghka


Vikash Mansinghka is a research scientist at MIT, where he leads the Probabilistic Computing Project, and a cofounder of Empirical Systems, a new venture-backed AI startup aimed at improving the credibility and transparency of statistical inference. Previously, Vikash cofounded a venture-backed startup based on his research that was acquired by Salesforce, was an advisor to Google DeepMind, and held graduate fellowships at the National Science Foundation and MIT’s Lincoln Laboratory. He served on DARPA’s Information Science and Technology advisory board from 2010 to 2012 and currently serves on the editorial boards for the Journal of Machine Learning Research and Statistics and Computation. Vikash holds a PhD in computation, an MEng in computer science, and BS degrees in mathematics and computer science, all from MIT. His PhD dissertation on natively probabilistic computation won the MIT George M. Sprowls dissertation award in computer science, and his research on the Picture probabilistic programming language won an award at CVPR.