Presented By O’Reilly and Intel AI

Beijing • New York • San Francisco • London

Put AI to Work

April 29-30, 2018: Training

April 30-May 2, 2018: Tutorials & Conference

New York, NY

word2vec and friends

Bruno Goncalves (Data For Science)

9:00am–12:30pm Monday, April 30, 2018

Implementing AI
Location: Nassau East/West

Average rating:

(3.80, 5 ratings)

Who is this presentation for?

Data scientists

Prerequisite knowledge

A basic understanding of linear algebra and calculus

Materials or downloads needed in advance

A laptop with Python 3.5+ and TensorFlow installed

What you'll learn

Explore the main algorithms underlying word embeddings and their applications

Description

Word embeddings have received a lot of attention ever since Tomas Mikolov published word2vec in 2013 and showed that the embeddings that a neural network learned by “reading” a large corpus of text preserved semantic relations between words. As a result, this type of embedding began to be studied in more detail and applied to more serious NLP and IR tasks, such as summarization and query expansion. More recently, researchers and practitioners alike have come to appreciate the power of this type of approach, creating a burgeoning cottage industry centered around applying Mikolov’s original approach to different areas.

Bruno Gonçalves explores word2vec and its variations, discussing the main concepts and algorithms behind the neural network architecture used in word2vec and the word2vec reference implementation in TensorFlow, and shares some of the applications word embeddings have found in various areas. Bruno starts with an intuitive overview of the main concepts and algorithms underlying the neural network architecture used in word2vec. Bruno then presents a bird’s-eye view of the emerging field of “anything”-2vec methods (dna2vec, node2vec, etc.) that use variations of the word2vec neural network architecture.

Outline:

Neural network architecture and algorithms underlying word2vec

Basic intuition
Skip-gram
Softmax
Cross-entropy
BackProp
Online sources for pretrained embeddings

Properties and applications of word embeddings

Visualization
Analogies

A brief overview of TensorFlow

Installation
Computational graph
Simple example (linear fitting)

A detailed discussion of TensorFlow’s reference implementation

word2vec variations and their applications

Bruno Goncalves

Data For Science

Bruno Gonçalves is a chief data scientist at Data For Science, working at the intersection of data science and finance. Previously, he was a data science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since completing his PhD in the physics of complex systems in 2008, he’s been pursuing the use of data science and machine learning to study human behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme, he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of computational linguistics, information diffusion, behavioral change and epidemic spreading. In 2015, he was awarded the Complex Systems Society’s 2015 Junior Scientific Award for “outstanding contributions in complex systems science” and in 2018 was named a science fellow of the Institute for Scientific Interchange in Turin, Italy.

Website

Comments on this page are now closed.

Comments

Bruno Goncalves | CHIEF DATA SCIENTIST

05/03/2018 4:33am EDT

Thank you! I’m glad you found it interesting. You can find the slides and al l the code on the courses github: https://github.com/bmtgoncalves/word2vec-and-friends

Joby Thomas | MANAGER APPLICATIONS AND DECISION SUPPORT SYSTEMS

05/03/2018 3:58am EDT

Hello excellent presentation! Would it be possible to get a copy of the slides? Thanks so much.

Presented by

Elite Sponsors

Strategic Sponsors

Knowledge Sponsor

Contributing Sponsors

Impact Sponsors

Premier Exhibitors

Supporting Sponsors

Community Partner

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email aisponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of AI contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com