Textual information is often represented through structured documents which have an inherent 2D structure. This is even more so the case with the advent of new types of media and communications such as presentations, websites, blogs and formatted notebooks. In such documents, the layout, positioning, and sizing might be crucial to understand its semantic content and provide a strong guidance to the human perception.
Natural language processing (NLP) addresses the task of processing and understanding plain texts. However, it processes text by serializing it thereby completely ignoring any 2D structure in the text. On the other hand, computer vision (CV) may be used to process document images. In this way, the structure is retained but the document semantics should be learned all the way from the image pixels. We introduce a new representation for 2D documents – the character grid (chargrid) – that retains the original 2D structure while directly encoding the characters in the text. The character grid representation can readily be used with, e.g. deep neural networks. We apply chargrid to the task of information extraction from invoices and show that it captures the best of both worlds – NLP and CV. Chargrid is accepted for presentation at EMNLP 2018 and is also deployed in the production system of SAP Concur, currently processing tens of thousands of invoices every month.
Chargrid: Towards Understanding 2D Documents (https://arxiv.org/pdf/1809.08799.pdf), EMNLP 2018
Anoop Katti is a Data Scientist in the Deep Learning center at SAP. He did his bachelor studies at BIT, Bangalore. After a 1-year experience in building telecom software at Huawei, he pursued a research-based master’s in computer Vision at IIT Madras. During his time at SAP, he has extensively worked on documents with strong 2D structure where he has amalgamated his prior experience in Computer Vision with techniques from Natural Language Processing. Anoop has acquired multiple patents and publications in the field.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org