Effective sampling methods within TensorFlow input functions

Laxmi Prajapat (Datatonic), William Fletcher (Datatonic)

11:50am–12:30pm Thursday, October 31, 2019

Location: Grand Ballroom H

Applications

Download slides (PDF)

Who is this presentation for?

Machine learning practitioners, data scientists, and ML engineers

Level

Intermediate

Description

Many real-world machine learning applications require generative or reductive sampling of data. At training time this may be to deal with class imbalance (e.g., rarity of positives in a binary classification problem or a sparse user-item interaction matrix) or to augment the data stored on file; it may also simply be a matter of efficiency. Laxmi Prajapat and William Fletcher explore some sampling techniques in the context of recommender systems, using tools available in the tf.data API, and detail which methods are beneficial with given data and hardware demands. They present quantitative results, along with a closer examination of potential pros and cons.

Naively, a precomputed subsample of data will make for a fast input function. But to take advantage of random samples, more must be done. Laxmi and William consider how to select from a large dataset containing all possible inputs, and they look at generating these in memory using tf.random and exploiting hash tables where appropriate. These methods grant additional flexibility and reduce data preparation workloads.

Prerequisite knowledge

Experience with the tf.data API

What you'll learn

Learn how to efficiently use input functions to balance data by sampling or generating from existing rows
See a demonstration using open-sourced example code

Laxmi Prajapat

Datatonic

Laxmi Prajapat is a senior data scientist at Datatonic, with involvement in end-to-end project delivery, including stakeholder management, data exploration, machine learning, algorithm design, automation, and productionization solutions on Google Cloud. After a masters in astrophysics from UCL, Laxmi has held several technical roles in industry. She’s at her happiest when learning new things and challenging herself. Laxmi is always looking to expand her knowledge and apply it practically, especially in the fields of machine learning and engineering. Outside of work, she enjoys exploring new cuisines or finding a book to get lost in.

Website

William Fletcher

Datatonic

Will Fletcher is a machine learning (ML) researcher at Datatonic, where he concentrates on the technological progress of the company. He contributes an understanding of the most advanced methods in ML, along with experience in research and an eye for innovation. Previously, his academic career began as a chemist at Oxford; later, he moved to UCL for a further MSc in computational statistics and ML. Project and research work aside, Will delivers training days for companies to help them get started with ML. He believes in continuous education and learning as an essential part of technical excellence. This passion extends into his personal life, where he plays with math, programming and puzzles.