When it comes to natural language processing, general APIs and generic models are often far less accurate than you want. Or maybe the APIs you need don’t even exist. Either way, you can use “corpus bootstrapping” to create custom models and APIs. Corpus bootstrapping is a method of rapidly producing a custom corpus for training highly accurate natural language processing models. For example, suppose you want to do sentiment analysis for Spanish text, but you can only find APIs and models for English. Or you want to do phrase extraction for phrases that are not exactly noun phrases. Maybe you want to classify text but there’s no corpus in existence with the categories you’re interested in. All of these problems can be solved by iterating your way to a custom corpus for training custom models.
This talk will cover:
Code examples will be in Python using NLTK.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at email@example.com.
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts