Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Dealing with data scarcity in natural language processing

Yves Peirsman (NLP Town)
12:0512:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Average rating: ****.
(4.57, 7 ratings)

Who is this presentation for?

  • NLP and machine learning professionals



Prerequisite knowledge

  • A basic understanding of machine learning and data science

What you'll learn

  • Learn how to build the best NLP model using fewer training examples


It’s often said we live in the age of big data. Therefore, it may come as a surprise that in the field of natural language processing, machine learning professionals are often faced with data scarcity. Many organizations that would like to apply NLP lack a sufficiently large collection of labeled text in their language or domain to train a high-quality NLP model.

Luckily, there’s a wide variety of ways to address this challenge. First, approaches such as active learning reduce the number of training instances that have to be labeled in order to build a high-quality NLP model. Second, techniques such as distant supervision and proxy-label approaches can help label training examples automatically. Finally, recent developments in semisupervised learning, transfer learning, and multitask learning help models improve by making better use of unlabeled data or training them on several tasks at the same time.

Yves Peirsman offers an overview of these approaches and discusses their advantages and disadvantages—illustrating their effectiveness with example projects that his company NLP Town has worked on in the past few years.

Photo of Yves Peirsman

Yves Peirsman

NLP Town

Yves Peirsman is the founder and natural language processing expert at NLP Town. Yves started his career as a PhD student at the University of Leuven and a postdoctoral researcher at Stanford University. Since he made the move from academia to industry, he has gained extensive experience in consultancy and software development for NLP projects in Belgium and abroad.

Comments on this page are now closed.


Yves Peirsman |
9/05/2019 17:02 BST

Hi Lorenzo, you can find the slides here:

3/05/2019 11:51 BST

Are the slides of the talk available somewhere?