Unstructured data in the form of documents, web pages, and social media interactions is an ever-growing, ever-more valuable data source for addressing present business problems, from exploring brand sentiment to identifying sensitive information in internal documents. Unfortunately, the classification and annotation algorithms behind solving these problems often require significant amounts of labeled training data to produce desired accuracy.
Michael Johnson and Norris Heintzelman share several techniques they’ve implemented to build classification and NER models from scratch. They lead a tour through this space as it applies to NLP and demonstrate their approach and architecture for the following techniques:
For each of these topics, Michael and Norris outline the theoretical foundation, the implementation architecture, and tools used and discuss the problems they encountered—so you can avoid making the same mistakes.
Michael Johnson is a senior data scientist at Lockheed Martin. He has done data science and analytics in fields including manufacturing optimization, semiconductor reliability, and human resources-focused time series forecasting and simulation. He has recently been focused on how to apply cutting-edge deep learning algorithms to NLP domains.
Norris Heintzelman is a senior research and data scientist with 19 years’ real-world experience converting data into knowledge—that is, 19 years’ experience in many areas of natural language processing, knowledge systems, cleaning and normalizing messy data, and rigorous accuracy measurement. Norris has published several papers in the fields of health informatics and general knowledge management. She has worked for Lockheed Martin for a very long time, in multiple business areas, from public sector contracts to advanced R&D to internal business process support. An alumna of both Temple University and the University of Pennsylvania, she lives in Wilmington, Delaware, with her husband, two daughters, and two cats. She likes to eat and talk about food.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org