Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

How to make analytic operations look more like DevOps: Lessons learned moving machine-learning algorithms to production environments

Robert Grossman (University of Chicago)
2:40pm–3:20pm Wednesday, 03/30/2016
Average rating: ***..
(3.86, 14 ratings)

Prerequisite knowledge

Participants should have some experience working with IT to move machine-learning algorithms to production environments.

Description

Robert Grossman discusses some lessons learned moving machine-learning algorithms usually run manually by data scientists to operational environments where they run automatically on the new data that arrives each day. Robert offers three case studies in order to extract several techniques that have consistently proved useful and discuss how best these techniques can be used in practice: the first case study deals with the development of a system to analyze genomic datasets; the second describes the development of a system for the daily processing of new hyperspectral images to look for patterns of interest; and the third involves the incremental improvement of an algorithm for change detection run on streaming data.

Topics from these case studies include:

  • Automating the classification of data-quality errors
  • Optimizing the scheduling and running of machine-learning algorithms to maintain throughput
  • Developing dashboards for analytic operations that provide a common operating picture
  • Integrating DevOps with analytic operations
  • Automating the analysis of why some of the algorithms fail on some of the data
  • Why languages for analytic models and analytic workflows are critical to incrementally improving analytic operations
Photo of Robert Grossman

Robert Grossman

University of Chicago

Robert Grossman is a faculty member and the chief research informatics officer in the Biological Sciences Division of the University of Chicago. Robert is the director of the Center for Data Intensive Science (CDIS) and a senior fellow at both the Computation Institute (CI) and the Institute for Genomics and Systems Biology (IGSB). He is also the founder and a partner of the Open Data Group, which specializes in building predictive models over big data. Robert has led the development of open source software tools for analyzing big data (Augustus), distributed computing (Sector), and high-performance networking (UDT). In 1996, he founded Magnify, Inc., which provides data-mining solutions to the insurance industry and was sold to ChoicePoint in 2005. He is also the chair of the Open Cloud Consortium, a not-for-profit that supports the research community by operating cloud infrastructure, such as the Open Science Data Cloud. He blogs occasionally about big data, data science, and data engineering at Rgrossman.com.