Put AI to Work
April 15-18, 2019
New York, NY

Chargrid: Understanding 2D documents

Anoop Katti (SAP)
2:40pm3:20pm Thursday, April 18, 2019
Machine Learning, Models and Methods
Location: Grand Ballroom West
Secondary topics:  Computer Vision, Models and Methods, Text, Language, and Speech
Average rating: ****.
(4.60, 5 ratings)

Who is this presentation for?

  • ML researchers and practitioners and data scientists

Level

Intermediate

Prerequisite knowledge

  • Familiarity with deep learning (CNN, RNN), object detection in computer vision, and sequence modeling in natural language processing

What you'll learn

  • Understand the shortcomings of the existing techniques from NLP and CV when applied to structured and formatted documents
  • Explore Chargrid, a new technique pioneered by the data scientists at SAP that can be applied to any structured document type

Description

Textual information is often represented through structured documents, which have an inherent 2D structure—particularly with the advent of new types of media and communications such as presentations, websites, blogs, and formatted notebooks. In such documents, the layout, positioning, and sizing might be crucial to understanding its semantic content and provide strong guidance for the human perception.

Natural language processing (NLP) addresses the task of processing and understanding plain text. However, it processes text by serializing it, completely ignoring any 2D structure in the text. On the other hand, computer vision (CV) may be used to process document images, retaining the structure but learning the document semantics from the image pixels.

Anoop Katti explores the shortcomings of the existing techniques for understanding 2D documents and offers an overview of the Character Grid (Chargrid), a new processing pipeline pioneered by data scientists at SAP that retains the original 2D structure while directly encoding the characters in the text. The Character Grid representation can readily be used with deep neural networks, for example. Anoop applies Chargrid to the task of information extraction from invoices to show how it captures the best of both NLP and CV.

Chargrid is accepted for presentation at EMNLP 2018 and is also deployed in the production system of SAP Concur, currently processing tens of thousands of invoices every month.

Photo of Anoop Katti

Anoop Katti

SAP

Anoop Katti is a data scientist in the Deep Learning Center at SAP, where he combines computer vision with techniques from natural language processing to work on documents with strong 2D structure. Previously, he built telecom software at Huawei. Anoop holds multiple patents and publications in the field. He did his bachelor studies at BIT, Bangalore, and pursued a research-based master’s in computer vision at IIT Madras.