Sep 23–26, 2019

Improving OCR Quality of Documents using Generative Adversarial Networks

1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 06/07
Secondary topics:  Deep Learning, Financial Services, Health and Medicine

Who is this presentation for?

Directors / VPs of Operations involving document processing

Level

Intermediate

Description

While we move towards a digital world, paper based document processing is still an integral part of several business processes across a wide range of domains from Finance to Healthcare.

Traditionally, these documents are processed manually and NLP based document processing solutions bring significant efficiencies in this manual process by automating document classification, extraction or search. Optical Character Recognition (OCR) is a crucial step in any document processing task as it enables machine interpretation of text and parsing textual information. However, issues in document quality, such as blurred text, fragmented characters, merging characters, low resolution, skew, as well as noise can tremendously affect OCR accuracy and degrade its performance. In particular, images are corrupted through stains, wrinkles, pixel noise, and impulse noise which is added during scanning. Poor resolution leads to merging of character strokes with the document background.

To overcome these challenges, we have developed a generative adversarial network (GAN) for enhancing the resolution of scanned images as well as denoising them. The model was trained using custom generated data combined with externally available datasets. Once trained, the generator is able to increase the resolution of the document by a predefined factor, sharpen the character borders, increase the contrast, and eliminate the pixel noise while preserving edge features of characters. The model significantly improves the OCR accuracy (using open-source OCR as well as commercial OCR systems) on a word level as well as character level. Implementing our GAN model also helped us achieve significant efficiencies in multiple document processing solutions from Finance to Healthcare. Our model also helped improve accuracy of Handwriting detection APIs as well.

During this session, we will discuss how GANs can bring efficiency and accuracy in document processing pipelines. In particular, we will present how the dataset was created for this task, the model architecture and training methodology. We will showcase the performance of the model across multiple document processing applications and discuss potential next steps.

Prerequisite knowledge

Business Perspective: No prerequisite required Technical Perspective: Basic knowledge of deep learning and GAN

What you'll learn

Improving document quality and OCR accuracy using GANs
Photo of Nagendra Shishodia

Nagendra Shishodia

EXL

Nagendra leads the Analytics Product Development initiative for EXL.  He has over 17 years of experience in developing advanced analytics solutions across business functions.  His focus has been on developing solutions that enable better decision making through the use of Machine Learning, Natural Language Processing and Big Data technologies.  Nagendra consults with senior executives of global firms across industry – including healthcare, insurance, banking, retail, and travel.  Nagendra holds an MS degree from Purdue University, IN and a B.Tech. from IIT Bombay.  At EXL, Nagendra has written thought leadership articles on healthcare clinical solutions and AI.

Photo of Chaithanya Manda

Chaithanya Manda

EXL

Chaithanya is an Assistant Vice President at EXL Service. He has over 10 years of experience in developing advanced analytics solutions across multiple business domains. He holds a bachelor of technology degree from IIT Guwahati. At EXL,he is responsible for building AI enabled solutions which can bring efficiencies across various business processes

Photo of Solmaz Torabi

Solmaz Torabi

EXL

Solmaz Torabi is a Data Scientist at EXL Service. She holds a PhD in Electrical and Computer Engineering from Drexel University. At EXL, she is responsible for building image and text analytics models using deep learning methods to extract information from images and documents.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts