Effective sampling methods within TensorFlow input functions
Who is this presentation for?
- Machine learning practitioners, data scientists, and ML engineers
Many real-world machine learning applications require generative or reductive sampling of data. At training time this may be to deal with class imbalance (e.g., rarity of positives in a binary classification problem or a sparse user-item interaction matrix) or to augment the data stored on file; it may also simply be a matter of efficiency. Laxmi Prajapat and William Fletcher explore some sampling techniques in the context of recommender systems, using tools available in the tf.data API, and detail which methods are beneficial with given data and hardware demands. They present quantitative results, along with a closer examination of potential pros and cons.
Naively, a precomputed subsample of data will make for a fast input function. But to take advantage of random samples, more must be done. Laxmi and William consider how to select from a large dataset containing all possible inputs, and they look at generating these in memory using tf.random and exploiting hash tables where appropriate. These methods grant additional flexibility and reduce data preparation workloads.
- Experience with the tf.data API
What you'll learn
- Learn how to efficiently use input functions to balance data by sampling or generating from existing rows
- See a demonstration using open-sourced example code
Laxmi Prajapat is a senior data scientist at Datatonic, with involvement in end-to-end project delivery, including stakeholder management, data exploration, machine learning, algorithm design, automation, and productionization solutions on Google Cloud. After a masters in astrophysics from UCL, Laxmi has held several technical roles in industry. She’s at her happiest when learning new things and challenging herself. Laxmi is always looking to expand her knowledge and apply it practically, especially in the fields of machine learning and engineering. Outside of work, she enjoys exploring new cuisines or finding a book to get lost in.
Will Fletcher is a machine learning (ML) researcher at Datatonic, where he concentrates on the technological progress of the company. He contributes an understanding of the most advanced methods in ML, along with experience in research and an eye for innovation. Previously, his academic career began as a chemist at Oxford; later, he moved to UCL for a further MSc in computational statistics and ML. Project and research work aside, Will delivers training days for companies to help them get started with ML. He believes in continuous education and learning as an essential part of technical excellence. This passion extends into his personal life, where he plays with math, programming and puzzles.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires