Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

The evolution of data science skill sets: An analysis using exponential family embeddings

Maryam Jahanshahi (TapRecruit)
14:0514:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Average rating: ****.
(4.00, 3 ratings)

Who is this presentation for?

  • NLP practitioners and ML engineers



Prerequisite knowledge

  • A basic understanding of word embedding models such as word2vec (useful but not required)

What you'll learn

  • Understand the strengths and weaknesses of different embeddings methods (pertained versus custom)
  • Learn how to map trends from a combination of natural language and structured data and how data science skills have varied across industries and functions and over time


Many data scientists are familiar with word embedding models such as word2vec, which capture semantic similarity among words and phrases in a corpus. However, word embeddings are limited in their ability to interrogate a corpus alongside other context or over time. Moreover, word embedding models either need significant amounts of data or tuning through transfer learning of a domain-specific vocabulary that’s unique to most commercial applications.

Maryam Jahanshahi explores exponential family embeddings. Developed by Rudolph and Blei, these methods extend the idea of word embeddings to other types of high-dimensional data. Maryam demonstrates how this technique can be used to conduct advanced topic modeling on datasets that are medium sized, are specialized enough to require significant modifications of a word2vec model, and contain more general data types (including categorical, count, and continuous).

You’ll learn how TapRecruit implemented a dynamic embedding model using gensim and TensorFlow and its proprietary corpus of job descriptions. Using both categorical and natural language data associated with jobs, TapRecruit charted the development of different skill sets over the last three years. Maryam focuses on how data science and quantitative skill sets have developed, grown, and pollinated other types of jobs over time. If time allows, she’ll also discuss other segmentation analyses that were performed, including company types and geographies.

Photo of Maryam Jahanshahi

Maryam Jahanshahi


Maryam Jahanshahi is a research scientist at TapRecruit, a platform that uses AI and automation tools to bring efficiency and fairness to the recruiting process. She holds a PhD from the Icahn School of Medicine at Mount Sinai, where she studied molecular regulators of organ-size control. Maryam’s long-term research goal is to reduce bias in decision making by using a combination of computation linguistics, machine learning, and behavioral economics methods.