Sep 9–12, 2019

Document Understanding: extracting structured information from financial images and forms

Joy Rimchala (Intuit), TJ Torres (Intuit), Xiao Xiao (Intuit), Hui Wang (Intuit)
11:55am12:35pm Wednesday, September 11, 2019
Location: 231

Who is this presentation for?

Data scientists and students




In today’s highly automated world of financial services, consumers, self-employed and small business owners still face the tedious and time-consuming task of entering data manually from paper documents.

Intuit’s Document Understanding Platform orchestrates a variety of services and machine learning capabilities using structured and unstructured documents uploaded by users, regardless of format (smartphone photos, PDFs, forms, etc.), and presents high confidence results back within the company’s product ecosystem.

Four primary components comprise a system through which all documents pass through dynamically, depending on the document use case: pre-processing of documents, optical character recognition as applied to images, classification of document type and extraction of key fields.

Intuit data scientists, Joy Rimchala, Xiao Xiao, TJ Torres and Hui Wang, will describe the design and modeling methodologies used to build the Document Understanding Platform – and share lessons learned along the way.

Prerequisite knowledge

Fundamental data science concepts and terminology

What you'll learn

Attendees will learn how to use and scale machine learning technologies to automate categorization and extraction of documents into their systems from documents of all types (images, PDFs, forms, etc.).
Photo of Joy Rimchala

Joy Rimchala


Joy is a Data Scientist in Intuit’s Machine Learning Futures Group working on ML problems in limited label data settings. Joy holds a PhD from MIT, where she spent five years doing biological object tracking experiments, and modeling them using Markov Decision Processes.

Photo of TJ Torres

TJ Torres


After receiving his PhD in Physics, TJ began working in industry as an applied ML researcher. His previous work includes building fashion recommendation models using computer vision to help understand visual style at Stitch Fix, as well as building models to help automatically analyze issues with sign-up conversion at Netflix. Now, at Intuit, he works on the ML Futures team tackling research problems in the areas of CV and NLP in order to better customer experience within Intuit’s core products.

Photo of Xiao Xiao

Xiao Xiao


Xiao Xiao is a Data Scientist in Intuit’s Consumer Group, using ML to enhance customer experience. Xiao holds a PhD in Ecology and a MS in Statistics, where she applied statistical analysis to study ecological patterns at broad spatial and temporal scales.

Photo of Hui Wang

Hui Wang


Hui Wang is a Staff Data Scientist at Intuit. Hui has a PhD in Chemical Engineering from Yale. Prior to Intuit, he conducted fundamental NLP research with grants from NIST and the CIA, and provided data modeling for investment banks and hedge funds.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of O'Reilly AI contacts