Put AI to Work
April 15-18, 2019
New York, NY

Beyond Word2Vec: Using embeddings to chart out the ebb and flow of tech skills

Maryam Jahanshahi (TapRecruit)
1:50pm2:30pm Wednesday, April 17, 2019
Case Studies, Machine Learning
Location: Sutton South
Secondary topics:  AI in the Enterprise, Text, Language, and Speech
Average rating: *****
(5.00, 6 ratings)

Who is this presentation for?

  • NLP practitioners and ML engineers

Level

Intermediate

Prerequisite knowledge

  • A basic understanding of word embedding models, such as word2vec (useful but not required)

What you'll learn

  • Explore lessons learned from implementing different word embedding methods (from pertained to custom)
  • Discover how to map trends from a combination of natural language and structured data
  • See how tech skills have varied across industries and functions and over time

Description

Many data scientists are familiar with word embedding models such as word2vec, which capture semantic similarity among words and phrases in a corpus. However, word embeddings are limited in their ability to interrogate a corpus alongside other context or over time. Moreover, word embedding models either need significant amounts of data or tuning through transfer learning of a domain-specific vocabulary that’s unique to most commercial applications.

Maryam Jahanshahi discusses exponential family embeddings, which apply probabilistic embedding models to other data types. Developed by Rudolph and Blei, these methods extend the idea of word embeddings to other types of high-dimensional data. Maryam demonstrates how they can be used to conduct advanced topic modeling on datasets that are medium sized, which are specialized enough to require significant modifications of a word2vec model and contain more general data types (including categorical, count, and continuous).

Maryam explains how TapRecruit implemented a dynamic embedding model using Tensor Flow and its proprietary corpus of job descriptions. Using both categorical and natural language data associated with jobs, the company charted the development of different skill sets over the last three years. You’ll discover how tech and data science skill sets have developed, grown, and pollinated other types of jobs over time and explore other segmentation analyses TapRecruit performed, including company types or geographies.

This talk is for both data science practitioners and data engineers. Rather than describe mathematical underpinnings, it focuses on the evolution of topic modeling systems using NLP.

Photo of Maryam Jahanshahi

Maryam Jahanshahi

TapRecruit

Maryam Jahanshahi is a research scientist at TapRecruit, a platform that uses AI and automation tools to bring efficiency and fairness to the recruiting process. She holds a PhD from the Icahn School of Medicine at Mount Sinai, where she studied molecular regulators of organ-size control. Maryam’s long-term research goal is to reduce bias in decision making by using a combination of computation linguistics, machine learning, and behavioral economics methods.