San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Dealing with data scarcity in natural language processing

Yves Peirsman (NLP Town)

12:05–12:45 Wednesday, 1 May 2019

Data Science, Machine Learning & AI
Location: Capital Suite 14

Secondary topics: Text and Language processing and analysis

Average rating:

(4.57, 7 ratings)

View slides

Who is this presentation for?

NLP and machine learning professionals

Level

Intermediate

Prerequisite knowledge

A basic understanding of machine learning and data science

What you'll learn

Learn how to build the best NLP model using fewer training examples

Description

It’s often said we live in the age of big data. Therefore, it may come as a surprise that in the field of natural language processing, machine learning professionals are often faced with data scarcity. Many organizations that would like to apply NLP lack a sufficiently large collection of labeled text in their language or domain to train a high-quality NLP model.

Luckily, there’s a wide variety of ways to address this challenge. First, approaches such as active learning reduce the number of training instances that have to be labeled in order to build a high-quality NLP model. Second, techniques such as distant supervision and proxy-label approaches can help label training examples automatically. Finally, recent developments in semisupervised learning, transfer learning, and multitask learning help models improve by making better use of unlabeled data or training them on several tasks at the same time.

Yves Peirsman offers an overview of these approaches and discusses their advantages and disadvantages—illustrating their effectiveness with example projects that his company NLP Town has worked on in the past few years.

Yves Peirsman

NLP Town

Yves Peirsman is the founder and natural language processing expert at NLP Town. Yves started his career as a PhD student at the University of Leuven and a postdoctoral researcher at Stanford University. Since he made the move from academia to industry, he has gained extensive experience in consultancy and software development for NLP projects in Belgium and abroad.

Website

Comments on this page are now closed.

Comments

Yves Peirsman |

9/05/2019 17:02 BST

Hi Lorenzo, you can find the slides here: https://www.slideshare.net/YvesPeirsman/strata-conference-dealing-with-data-scarcity-in-natural-language-processing

Lorenzo Ansaloni | SENIOR PRINCIPAL SOFTWARE ENGINEER

3/05/2019 11:51 BST

Are the slides of the talk available somewhere?

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com