Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Velox: The Missing Piece in the Predictive Analytics Stack

Daniel Crankshaw (UC Berkeley)
1:50pm–2:10pm Friday, 02/20/2015
Data Science
Location: LL20 A
Average rating: ****.
(4.67, 3 ratings)

The AMPLab’s Berkeley Data Analytics Stack (BDAS) has been the incubator for a series of widely adopted Big Data open source systems, including Apache Spark, Apache Mesos, Tachyon, MLlib and GraphX.   In this talk, I will introduce Velox, the newest component of BDAS. Velox is the missing piece in the predictive analytics stack enabling interactive applications ranging from content recommendations to personalized search by addressing the challenges of serving and managing personalized machine learning models at scale.  Velox provides end-user applications and services with an intuitive, low-latency interface to models, transforming the raw statistical models currently trained offline into full-blown, end-to-end data products capable of dynamically targeting advertisements, recommending products, and personalizing web content. I’ll describe how Velox achieves this functionality, including its abilities to span online and offline training systems, to adaptively adjust model selection strategies, and address the statistical challenges associated with sample bias in a closed loop system.

Daniel Crankshaw

UC Berkeley

Daniel Crankshaw is a second year PhD student working in the UC Berkeley AMPLab with Michael Franklin. Dan’s research focuses on how ideas in distributed database systems can be applied to machine learning and data analytics tasks. Prior to the Velox model management project, Dan worked on GraphX, a system that applies distributed database techniques to graph analytics.