Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Introduction to generalized low-rank models and missing values

Jo-fai Chow (
14:05–14:45 Thursday, 2/06/2016
Data science & advanced analytics
Location: Capital Suite 8/9 Level: Intermediate
Average rating: ***..
(3.50, 2 ratings)

Prerequisite knowledge

Attendees do not need a background in data science but should be familiar with Python, Scala, or R.


Across business and research, analysts seek to understand large collections of data with numeric, Boolean, and categorical values. Many entries in the table may be noisy or even missing altogether. Low-rank models facilitate understanding of tabular data by producing a condensed vector representation for every row and column in the dataset. These representations can then be compared, clustered, plotted, and used in subsequent analysis.

Jo-fai Chow describes offers an overview of low-rank models and demonstrates how to build them in H2O, an open source distributed machine-learning platform. Through examples, Jo-fai explains how to fit low-rank models to numeric and categorical datasets with missing values and how to use these models to identify important features and make better predictions.

Topics include:

  • Imputing missing data entries
  • Compressing features into essential archetypes
  • Reducing high-cardinality categorical data into numeric columns
  • Visualizing results in R and Python
  • Integrating low-rank models into machine-learning pipelines
Photo of Jo-fai Chow

Jo-fai Chow

Jo-fai (Joe) Chow is a customer data scientist at Joe liaises with customers to expand the use of H2O beyond the initial footprint. Before joining H2O, he was on the business intelligence team at Virgin Media, where he developed data products to enable quick and smart business decisions. Joe also worked (part-time) for Domino Data Lab as a data science evangelist, promoting products via blogging and giving talks at meetups.