Across business and research, analysts seek to understand large collections of data with numeric, Boolean, and categorical values. Many entries in the table may be noisy or even missing altogether. Low-rank models facilitate understanding of tabular data by producing a condensed vector representation for every row and column in the dataset. These representations can then be compared, clustered, plotted, and used in subsequent analysis.
Jo-fai Chow describes offers an overview of low-rank models and demonstrates how to build them in H2O, an open source distributed machine-learning platform. Through examples, Jo-fai explains how to fit low-rank models to numeric and categorical datasets with missing values and how to use these models to identify important features and make better predictions.
Jo-fai (Joe) Chow is a customer data scientist at H2O.ai. Joe liaises with customers to expand the use of H2O beyond the initial footprint. Before joining H2O, he was on the business intelligence team at Virgin Media, where he developed data products to enable quick and smart business decisions. Joe also worked (part-time) for Domino Data Lab as a data science evangelist, promoting products via blogging and giving talks at meetups.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.