Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY
Discover opportunities for applied AI
Organizations that successfully apply AI innovate and compete more effectively. How is AI transforming your business?
Be a part of the program—apply to speak by October 16.

Beyond Word2Vec: Using embeddings to chart out the ebb and flow of tech skills

Maryam Jahanshahi (TapRecruit)
1:50pm2:30pm Wednesday, April 17, 2019
Case Studies, Machine Learning
Location: Sutton South
Secondary topics:  AI in the Enterprise, Text, Language, and Speech

Who is this presentation for?

NLP Practioners, ML Engineers



Prerequisite knowledge

A basic understanding of word embedding models such as word2vec is helpful but not required.

What you'll learn

- Lessons learnt from implementing different word embedding methods (from pertained to custom) - How to map trends from a combination of natural language and structured data. - How tech skills have varied across industries, functions and over time.


Many data scientists are familiar with word embedding models such as word2vec, which capture semantic similarity among words and phrases in a corpus. However, word embeddings are limited in their ability to interrogate a corpus alongside other context or over time. Moreover, word embedding models either need significant amounts of data, or tuning through transfer learning of a domain-specific vocabulary that is unique to most commercial applications.

In this talk, I will introduce exponential family embeddings. Developed by Rudolph and Blei, these methods extend the idea of word embeddings to other types of high-dimensional data. I will demonstrate how they can be used to conduct advanced topic modeling on datasets that are medium-sized, which are specialized enough to require significant modifications of a word2vec model and contain more general data types (including categorical, count, continuous).

I will discuss how we implemented a dynamic embedding model using Tensor Flow and our proprietary corpus of job descriptions. Using both categorical and natural language data associated with jobs, we charted the development of different skill sets over the last 3 years. I specifically focus description of my results on how tech and data science skill sets have developed, grown and pollinated other types of jobs over time. If time allows, I will also discuss other segmentation analyses we performed, including company-types or geographies.

This talk is for both data science practitioners and data engineers. Rather than describe mathematical underpinnings, I will instead focus the discussion on the evolution of topic modeling systems using NLP.

I will specifically discuss the following;

- Introduction to word embeddings models (word2vec, GLoVE) focussing on barriers to real-world/industrial implementation
- Background on exponential family embeddings (with reference to Rudolph and Blei), focussing on applications of multivariate and Bernoulli models.
- Description of data used to train the model (size, types of data as well as processing steps that we optimized with)
- Description of results from model
- What ‘fringe’ tech skills have become ‘core’ tech skills?
- What tech skills have pollinated other types of roles?
- What tech skills are likely to be growing in importance over time?

Photo of Maryam Jahanshahi

Maryam Jahanshahi


Maryam Jahanshahi is a research scientist at TapRecruit, a platform that uses AI and automation tools to bring efficiency and fairness to the recruiting process. She holds a PhD from the Icahn School of Medicine at Mount Sinai, where she studied molecular regulators of organ size control. Maryam’s long-term research goal is to reduce bias in decision making by using a combination of computation linguistics, machine learning, and behavioral economics methods.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)