Ensemble machine-learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. Due to their flexibility and ability to perform better than individual models, ensembles are the technique used to win many Kaggle competitions.
Erin Ledell covers the basics of ensemble learning and offers an introduction to the scalable open source machine-learning library H2O. Erin then gives a demonstration of the H2O Ensemble package, which reduces the computational burden of ensemble learning while retaining superior model performance.
The H2O Ensemble software implements the Super Learner, or stacking, ensemble algorithm, using distributed-base learning algorithms from H2O. The Super Learner algorithm learns the optimal combination of the base learner fits. (This 2007 article, “Super Learner,” demonstrates why the Super Learner ensemble represents an asymptotically optimal system for learning.) Erin dives into these advanced topics and provides code demos for attendees to try out on their own.
Erin Ledell is the chief machine learning scientist at H2O.ai, the company that created the open source distributed machine learning platform H2O. Previously, she was the principal data scientist at Wise.io (acquired by GE Digital in 2016) and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin holds a PhD from the University of California, Berkeley, where her research focused on scalable machine learning and statistical computing, as well as a BS and MA in mathematics.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.