Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY
Discover opportunities for applied AI
Organizations that successfully apply AI innovate and compete more effectively. How is AI transforming your business?
Be a part of the program—apply to speak by October 16.

Chargrid: Understanding 2D documents

Anoop Katti (SAP)
2:40pm3:20pm Thursday, April 18, 2019
Machine Learning, Models and Methods
Location: Grand Ballroom West
Secondary topics:  Computer Vision, Models and Methods, Text, Language, and Speech

Who is this presentation for?

ML Practitioners, Data Scientists, ML Researchers

Level

Intermediate

Prerequisite knowledge

Deep learning (CNN, RNN), Object detection in Computer vision, sequence modeling in Natural language processing

What you'll learn

The audience will be able to appreciate the difference between plain text documents (like tweets and comments) and structured documents (like resumes and presentations). They will also learn the shortcoming of the existing techniques from NLP and CV when applied to structured and formatted documents. They will learn a new technique pioneered by the data scientists at SAP, called Character grid or Chargrid, that can be applied to any structured document type. Finally, they will be able to appreciate the application of Chargrid to a real-world problem - extracting structured information from invoices and see how it helps.

Description

Textual information is often represented through structured documents which have an inherent 2D structure. This is even more so the case with the advent of new types of media and communications such as presentations, websites, blogs and formatted notebooks. In such documents, the layout, positioning, and sizing might be crucial to understand its semantic content and provide a strong guidance to the human perception.

Natural language processing (NLP) addresses the task of processing and understanding plain texts. However, it processes text by serializing it thereby completely ignoring any 2D structure in the text. On the other hand, computer vision (CV) may be used to process document images. In this way, the structure is retained but the document semantics should be learned all the way from the image pixels. We introduce a new representation for 2D documents – the character grid (chargrid) – that retains the original 2D structure while directly encoding the characters in the text. The character grid representation can readily be used with, e.g. deep neural networks. We apply chargrid to the task of information extraction from invoices and show that it captures the best of both worlds – NLP and CV. Chargrid is accepted for presentation at EMNLP 2018 and is also deployed in the production system of SAP Concur, currently processing tens of thousands of invoices every month.

Reference:
Chargrid: Towards Understanding 2D Documents (https://arxiv.org/pdf/1809.08799.pdf), EMNLP 2018

Photo of Anoop Katti

Anoop Katti

SAP

Anoop Katti is a Data Scientist in the Deep Learning center at SAP. He did his bachelor studies at BIT, Bangalore. After a 1-year experience in building telecom software at Huawei, he pursued a research-based master’s in computer Vision at IIT Madras. During his time at SAP, he has extensively worked on documents with strong 2D structure where he has amalgamated his prior experience in Computer Vision with techniques from Natural Language Processing. Anoop has acquired multiple patents and publications in the field.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)