Sep 23–26, 2019
Please log in

Improving OCR quality of documents using generative adversarial networks

1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 06/07

Who is this presentation for?

  • Directors and vice presidents of operations involving document processing




While we move toward a digital world, paper-based document processing is still an integral part of several business processes across a wide range of domains from finance to healthcare. Traditionally, these documents are processed manually and natural language processing- (NLP) based document processing solutions bring significant efficiencies in this manual process by automating document classification, extraction, or search. Optical character recognition (OCR) is a crucial step in any document processing task as it enables machine interpretation of text and parsing textual information. However, issues in document quality, such as blurred text, fragmented characters, merging characters, low resolution, skew, and noise can tremendously affect OCR accuracy and degrade its performance. In particular, images are corrupted through stains, wrinkles, pixel noise, and impulse noise which is added during scanning. Poor resolution leads to merging of character strokes with the document background.

Nagendra Shishodia, Chaithanya Manda, and Solmaz Torabi dive into ways to overcome these challenges with generative adversarial network (GAN) for enhancing the resolution of scanned images and denoising them. The model was trained using custom generated data combined with externally available datasets. Once trained, the generator is able to increase the resolution of the document by a predefined factor, sharpen the character borders, increase the contrast, and eliminate the pixel noise while preserving edge features of characters. The model significantly improves the OCR accuracy (using open source and commercial OCR systems) on a word and character level. Implementing the GAN model also helped the company achieve significant efficiencies in multiple document processing solutions from finance to healthcare. The model also helped improve the accuracy of handwriting detection APIs.

You’ll explore how GANs can bring efficiency and accuracy in document processing pipelines. In particular, you’ll see how the dataset was created for this task, the model architecture, and training methodology. They showcase the performance of the model across multiple document processing applications and outline potential next steps.

Prerequisite knowledge

  • A basic understanding of deep learning and GAN

What you'll learn

  • Understand how to improve document quality and OCR accuracy using GANs
Photo of Nagendra Shishodia

Nagendra Shishodia


Nagendra Shishodia is the head of analytics products for EXL, where he leads the analytics product development initiative and has written thought leadership articles on healthcare clinical solutions and AI. He has over 17 years of experience in developing advanced analytics solutions across business functions. His focus has been on developing solutions that enable better decision making through the use of machine learning, natural language processing, and big data technologies. Nagendra consults with senior executives of global firms across industries including healthcare, insurance, banking, retail, and travel. Nagendra holds an MS degree from Purdue University and a BTech from the Indian Institute of Technology Bombay.

Photo of Chaithanya Manda

Chaithanya Manda


Chaithanya Manda is an assistant vice president at EXL, where he’s responsible for building AI-enabled solutions that can bring efficiencies across various business processes. He has over 10 years of experience in developing advanced analytics solutions across multiple business domains. He holds a bachelor’s of technology degree from the Indian Institute of Technology Guwahati.

Photo of Solmaz Torabi

Solmaz Torabi


Solmaz Torabi is a data scientist at EXL, where she’s responsible for building image and text analytics models using deep learning methods to extract information from images and documents. She holds a PhD in electrical and computer engineering from Drexel University.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires