Presented By O’Reilly and Intel AI
Put AI to Work
April 29-30, 2018: Training
April 30-May 2, 2018: Tutorials & Conference
New York, NY

word2vec and friends

Bruno Goncalves (Data For Science)
9:00am–12:30pm Monday, April 30, 2018
Implementing AI
Location: Nassau East/West
Average rating: ***..
(3.80, 5 ratings)

Who is this presentation for?

  • Data scientists

Prerequisite knowledge

  • A basic understanding of linear algebra and calculus

Materials or downloads needed in advance

  • A laptop with Python 3.5+ and TensorFlow installed

What you'll learn

  • Explore the main algorithms underlying word embeddings and their applications

Description

Word embeddings have received a lot of attention ever since Tomas Mikolov published word2vec in 2013 and showed that the embeddings that a neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding began to be studied in more detail and applied to more serious NLP and IR tasks, such as summarization and query expansion. More recently, researchers and practitioners alike have come to appreciate the power of this type of approach, creating a burgeoning cottage industry centered around applying Mikolov’s original approach to different areas.

Bruno Gonçalves explores word2vec and its variations, discussing the main concepts and algorithms behind the neural network architecture used in word2vec and the word2vec reference implementation in TensorFlow, and shares some of the applications word embeddings have found in various areas. Bruno starts with an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec. Bruno then presents a bird’s-eye view of the emerging field of “anything”-2vec methods (dna2vec, node2vec, etc.) that use variations of the word2vec neural network architecture.

Outline:

Neural network architecture and algorithms underlying word2vec

  • Basic intuition
  • Skip-gram
  • Softmax
  • Cross-entropy
  • BackProp
  • Online sources for pretrained embeddings

Properties and applications of word embeddings

  • Visualization
  • Analogies

A brief overview of TensorFlow

  • Installation
  • Computational graph
  • Simple example (linear fitting)

A detailed discussion of TensorFlow’s reference implementation

word2vec variations and their applications

Photo of Bruno Goncalves

Bruno Goncalves

Data For Science

Bruno Gonçalves is a chief data scientist at Data For Science, working at the intersection of data science and finance. Previously, he was a data science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since completing his PhD in the physics of complex systems in 2008, he’s been pursuing the use of data science and machine learning to study human behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme, he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of computational linguistics, information diffusion, behavioral change and epidemic spreading. In 2015, he was awarded the Complex Systems Society’s 2015 Junior Scientific Award for “outstanding contributions in complex systems science” and in 2018 was named a science fellow of the Institute for Scientific Interchange in Turin, Italy.

Comments on this page are now closed.

Comments

Picture of Bruno Goncalves
Bruno Goncalves | CHIEF DATA SCIENTIST
05/03/2018 4:33am EDT

Thank you! I’m glad you found it interesting. You can find the slides and al l the code on the courses github: https://github.com/bmtgoncalves/word2vec-and-friends

Joby Thomas | MANAGER APPLICATIONS AND DECISION SUPPORT SYSTEMS
05/03/2018 3:58am EDT

Hello excellent presentation! Would it be possible to get a copy of the slides? Thanks so much.