Document understanding: Extracting structured information from financial images and forms





Who is this presentation for?
- Data scientists and students
Level
IntermediateDescription
In today’s highly automated world of financial services, consumers, self-employed, and small business owners still face the tedious and time-consuming task of entering data manually from paper documents. Intuit’s document understanding platform orchestrates a variety of services and machine learning capabilities using structured and unstructured documents uploaded by users, regardless of format (smartphone photos, PDFs, forms, etc.), and presents high-confidence results back within the company’s product ecosystem.
Four primary components comprise a system in which all documents pass through dynamically, depending on the document use case: preprocessing of documents, optical character recognition as applied to images, classification of document type, and extraction of key fields. Intuit data scientists Joy Rimchala, Xiao Xiao, TJ Torres, and Hui Wang detail the design and modeling methodologies used to build the document understanding platform—and share lessons learned along the way.
Prerequisite knowledge
- A working knowledge of data science concepts and terminology
What you'll learn
- Learn to use and scale machine learning technologies to automate categorization and extraction of documents into your systems from documents of all types

Joy Rimchala
Intuit
Joy Rimchala is a data scientist in Intuit’s Machine Learning Futures Group working on ML problems in limited-label data settings. Joy holds a PhD from MIT, where she spent five years doing biological object tracking experiments and modeling them using Markov decision processes.

TJ Torres
Intuit
TJ Torres is a data scientist at Intuit, where he works on the ML futures team tackling research problems in the areas of computer vision (CV) and natural language processing (NLP) in order to better customer experience within Intuit’s core products. Previously, he worked as an applied ML researcher, including building fashion recommendation models using computer vision to help understand visual style at Stitch Fix and building models to help automatically analyze issues with sign-up conversion at Netflix. He holds a PhD in physics.

Xiao Xiao
Intuit
Xiao Xiao is a data scientist in Intuit’s Consumer Group, using ML to enhance customer experience. Xiao holds a PhD in ecology and a MS in statistics, where she applied statistical analysis to study ecological patterns at broad spatial and temporal scales.

Hui Wang
Intuit
Hui Wang is a staff data scientist at Intuit. Previously, he conducted fundamental natural language processing (NLP) research with grants from the National Institute of Standards and Technology (NIST) and the CIA and provided data modeling for investment banks and hedge funds. Hui has a PhD in chemical engineering from Yale.
Presented by
Elite Sponsors
Strategic Sponsors
Diversity and Inclusion Sponsor
Impact Sponsors
Premier Exhibitor Plus
R & D and Innovation Track Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
Become a sponsor
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires