Skip to main content

MLbase: Distributed Machine Learning Made Easy

Ameet Talwalkar (Carnegie Mellon University and Determined AI), Evan Sparks (Determined AI)
Data Science
Ballroom AB
Average rating: ****.
(4.14, 7 ratings)

Implementing and consuming Machine Learning techniques at scale are difficult tasks for ML Developers and End Users. MLbase (www.mlbase.org) is an open-source platform under active development addressing the issues of both groups. MLbase consists of three components — MLlib, MLI and ML Optimizer. MLlib is a low-level distributed ML library written against the Spark, MLI is an API / platform for feature extraction and algorithm development that introduces high-level ML programming abstractions, and ML Optimizer is a layer aiming to simplify ML problems for End Users by automating the tasks of feature and model selection. In this talk we will describe the high-level functionality of each of these layers, and demonstrate its scalability and ease-of-use via real-world examples involving classification, regression, clustering and collaborative filtering.

Photo of Ameet Talwalkar

Ameet Talwalkar

Assistant Professor ; Cofounder and Chief Scientist, Carnegie Mellon University and Determined AI

Ameet Talwalkar is an NSF post-doctoral fellow in the Computer Science Division at UC Berkeley. His work focuses on devising scalable machine learning algorithms, and more recently, on interdisciplinary approaches for connecting advances in machine learning to large-scale problems in science and technology. He obtained a bachelor’s degree from Yale University and a Ph.D. from the Courant Institute at New York University.

Photo of Evan Sparks

Evan Sparks

CEO, Determined AI

Evan Sparks is cofounder and CEO of Determined AI, a software company that makes machine learning engineers and data scientists fantastically more productive. Previously, Evan worked in quantitative finance and web intelligence. He holds a PhD in computer science from UC Berkeley, where, as a member of the AMPLab, he contributed to the design and implementation of much of the large-scale machine learning ecosystem around Apache Spark, including MLlib and KeystoneML. He also holds an AB in computer science from Dartmouth College.

Comments on this page are now closed.

Comments

Picture of Nasir Uddin
Nasir Uddin
02/08/2014 4:37pm PST

Am very much interested in this talk to know about the accuracy, simplicity, speed and interpretability of MLbase algorithms as applied to the distributed environment.