San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Spark NLP in action: How Indeed applies NLP to standardize résumé content at scale

Alexander Thomas (John Snow Labs), Alexis Yelton (Indeed)

11:15–11:55 Wednesday, 1 May 2019

Data Science, Machine Learning & AI
Location: Capital Suite 14

Secondary topics: Deep Learning, Media, Marketing, Advertising, Text and Language processing and analysis

Average rating:

(4.67, 3 ratings)

View slides

Who is this presentation for?

Data scientists and software developers who work with text

Level

Intermediate

Prerequisite knowledge

Familiarity with Apache Spark

What you'll learn

Learn how to use Spark NLP to process text and how to standardize text fields

Description

More people find jobs on Indeed than anywhere else. With two hundred million unique visitors a month, Indeed has accumulated hundreds of millions of jobs and résumés and trillions of data points of activity. Much of this data is entered by users. Because users express the same or similar facts in different ways, Indeed needs to standardize these fields. The traditional solution is to use a human-curated list of replacement rules. But with datasets as large and diverse as Indeed’s, the better solution is to use the data to normalize itself.

Spark NLP—John Snow Labs’ NLP library for Apache Spark—is an open source library that natively extends Spark ML to provide natural language processing capabilities with high performance, accuracy, and scalability. Spark NLP has algorithms that consist of rule-based, machine learning, and deep learning models. It provides advanced NLP functionalities like named-entity recognition, fact extraction, spell checking, sentiment analysis, assertion status detection, and others. These algorithms are combined via NLP pipelines to automate the multiple steps necessary to normalize natural language text, from spelling correction to stemming to using corpus statistics to identify preferred forms.

Alexis Yelton and Alex Thomas explain how to combine Spark NLP with Apache Spark’s built-in algorithms to create standardized semistructured text directly from résumés and job descriptions. These standardized strings can then be used to improve résumé or job search engines or to feed into machine learning models used for everything from predicting apply rates to recommending jobs to job seekers. Join in to explore the technical challenges, the algorithms, and how you can use them in your next text-processing project.

Alexander Thomas

John Snow Labs

Alex Thomas is a data scientist at John Snow Labs. He’s used natural language processing (NLP) and machine learning with clinical data, identity data, and job data. He’s worked with Apache Spark since version 0.9 as well as with NLP libraries and frameworks including UIMA and OpenNLP.

Alexis Yelton

Indeed

Alexis Yelton is a data scientist at Indeed focusing on building machine learning models for software products. She’s been working with Spark since version 1.6 and has recently moved into the NLP space. She holds a PhD in bioinformatics and did postdoctoral work building models to predict gene function and explain ecosystem function.

Comments on this page are now closed.

Comments

Alexis Yelton | DATA SCIENTIST

6/05/2019 16:09 BST

I have posted the slides on LinkedIn: www.linkedin.com/in/alexisyelton

Lorenzo Ansaloni | SENIOR PRINCIPAL SOFTWARE ENGINEER

3/05/2019 11:47 BST

Is it possible to get the slides of your presentation?

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com