Textual information is often represented through structured documents, which have an inherent 2D structure—particularly with the advent of new types of media and communications such as presentations, websites, blogs, and formatted notebooks. In such documents, the layout, positioning, and sizing might be crucial to understanding its semantic content and provide strong guidance for the human perception.
Natural language processing (NLP) addresses the task of processing and understanding plain text. However, it processes text by serializing it, completely ignoring any 2D structure in the text. On the other hand, computer vision (CV) may be used to process document images, retaining the structure but learning the document semantics from the image pixels.
Anoop Katti explores the shortcomings of the existing techniques for understanding 2D documents and offers an overview of the Character Grid (Chargrid), a new processing pipeline pioneered by data scientists at SAP that retains the original 2D structure while directly encoding the characters in the text. The Character Grid representation can readily be used with deep neural networks, for example. Anoop applies Chargrid to the task of information extraction from invoices to show how it captures the best of both NLP and CV.
Chargrid is accepted for presentation at EMNLP 2018 and is also deployed in the production system of SAP Concur, currently processing tens of thousands of invoices every month.
Anoop Katti is a data scientist in the Deep Learning Center at SAP, where he combines computer vision with techniques from natural language processing to work on documents with strong 2D structure. Previously, he built telecom software at Huawei. Anoop holds multiple patents and publications in the field. He did his bachelor studies at BIT, Bangalore, and pursued a research-based master’s in computer vision at IIT Madras.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org