Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Scalable ensemble learning with H2O

Erin Ledell (H2O.ai)
1:30pm–2:00pm Tuesday, 03/29/2016
Hardcore Data Science
Location: 210 C/G
Average rating: ****.
(4.25, 12 ratings)

Ensemble machine-learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. Due to their flexibility and ability to perform better than individual models, ensembles are the technique used to win many Kaggle competitions.

Erin Ledell covers the basics of ensemble learning and offers an introduction to the scalable open source machine-learning library H2O. Erin then gives a demonstration of the H2O Ensemble package, which reduces the computational burden of ensemble learning while retaining superior model performance.

The H2O Ensemble software implements the Super Learner, or stacking, ensemble algorithm, using distributed-base learning algorithms from H2O. The Super Learner algorithm learns the optimal combination of the base learner fits. (This 2007 article, “Super Learner,” demonstrates why the Super Learner ensemble represents an asymptotically optimal system for learning.) Erin dives into these advanced topics and provides code demos for attendees to try out on their own.

Photo of Erin Ledell

Erin Ledell

H2O.ai

Erin Ledell is a statistician and machine-learning scientist at H2O.ai. Erin is the main author of H2O Ensemble. Before joining H2O, she was the principal data scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin received her PhD in biostatistics from the University of California, Berkeley, with a designated emphasis in computational science and engineering. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence-curve-based variance estimation, and statistical computing. Erin also holds a BS and MA in mathematics.