It’s often said we live in the age of big data. Therefore, it may come as a surprise that in the field of natural language processing, machine learning professionals are often faced with data scarcity. Many organizations that would like to apply NLP lack a sufficiently large collection of labeled text in their language or domain to train a high-quality NLP model.
Luckily, there’s a wide variety of ways to address this challenge. First, approaches such as active learning reduce the number of training instances that have to be labeled in order to build a high-quality NLP model. Second, techniques such as distant supervision and proxy-label approaches can help label training examples automatically. Finally, recent developments in semisupervised learning, transfer learning, and multitask learning help models improve by making better use of unlabeled data or training them on several tasks at the same time.
Yves Peirsman offers an overview of these approaches and discusses their advantages and disadvantages—illustrating their effectiveness with example projects that his company NLP Town has worked on in the past few years.
Yves Peirsman is the founder and natural language processing expert at NLP Town. Yves started his career as a PhD student at the University of Leuven and a postdoctoral researcher at Stanford University. Since he made the move from academia to industry, he has gained extensive experience in consultancy and software development for NLP projects in Belgium and abroad.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
Hi Lorenzo, you can find the slides here: https://www.slideshare.net/YvesPeirsman/strata-conference-dealing-with-data-scarcity-in-natural-language-processing
Are the slides of the talk available somewhere?