Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Scalable ensemble learning with H2O

Erin Ledell (
1:30pm–2:00pm Tuesday, 03/29/2016
Hardcore Data Science
Location: 210 C/G
Average rating: ****.
(4.25, 12 ratings)

Ensemble machine-learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. Due to their flexibility and ability to perform better than individual models, ensembles are the technique used to win many Kaggle competitions.

Erin Ledell covers the basics of ensemble learning and offers an introduction to the scalable open source machine-learning library H2O. Erin then gives a demonstration of the H2O Ensemble package, which reduces the computational burden of ensemble learning while retaining superior model performance.

The H2O Ensemble software implements the Super Learner, or stacking, ensemble algorithm, using distributed-base learning algorithms from H2O. The Super Learner algorithm learns the optimal combination of the base learner fits. (This 2007 article, “Super Learner,” demonstrates why the Super Learner ensemble represents an asymptotically optimal system for learning.) Erin dives into these advanced topics and provides code demos for attendees to try out on their own.

Photo of Erin Ledell

Erin Ledell

Erin Ledell is the chief machine learning scientist at, the company that created the open source distributed machine learning platform H2O. Previously, she was the principal data scientist at (acquired by GE Digital in 2016) and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin holds a PhD from the University of California, Berkeley, where her research focused on scalable machine learning and statistical computing, as well as a BS and MA in mathematics.