A unified CV, OCR, and NLP model pipeline for scalable document understanding at DocuSign
Who is this presentation for?
- Data scientists and AI and software engineering leaders
DocuSign is the world’s largest esignature provider and serves more than 500,000 customers and hundreds of millions of users worldwide. Roshan Satish and Michael Chertushkin explain lessons the company learned from building a deep learning model pipeline that automatically understands where digitally fillable and signable fields are in each document—such as signature boxes, checkboxes, and numeric and text fill-in fields. A major challenge is learning to handle the immense variety of documents that DocuSign processes, covering just about every business transaction, such as insurance, healthcare, education, government, real estate, manufacturing, telecom, retail, employment, and legal affairs.
This task requires a modeling pipeline that unifies three usually separate problems: CV, OCR, and NLP.
Since many documents are scanned or uploaded as photographs, the system must handle image correction and preprocessing, page segmentation, and object detection. You’ll learn about the experiments and insights from applying state-of-the-art object detection models, among them SSD with MobileNet and Faster R-CNN with Inception Block.
Classifying fields requires “reading” the text around them, hence you have to train OCR, line detection, and page segmentation models.
Roshan and Michael detail results and lessons learned from training, annotating, and feature engineering, and the hyperparameter tuning of the NLP models that classify field types with documents.
Critically, the unified pipeline must satisfy these four requirements: state-of-the-art accuracy of the entire pipeline in contrast to local optimization of each model; interpretability, visually showing what the model inferred on each document to enable debugging and manual corrections to incorrect ground truth; scalability, the ability to scale training and inference to many millions of documents; and compliance, given the high sensitivity of many business documents that DocuSign handles, the entire training and inference infrastructure must be locked down and internally run within DocuSign data centers.
The deep learning libraries and models used are all open source and based on the TensorFlow ecosystem.
- Familiarity with deep learning and applications of CV, OCR, and NLP
What you'll learn
- Identify design patterns for training a multimodal CV, OCR, and NLP pipeline
- Learn the most effective neural network architectures for this type of problem and scale and interpretability for each step of the pipeline
Roshan Satish is a product manager who has been involved with artificial intelligence initiatives at DocuSign since their inception. He came to the company through an acquisition of a CLM startup, SpringCM, and worked with product leadership across the organization to formalize an AI vision before beginning to scale out the team. His job has been to create a robust, enterprise-grade deep learning platform that enables intelligence and insights across the DocuSign Agreement Cloud. Understandably, many of the use cases center around document understanding and natural language processing (NLP) and natural language understanding (NLU)—but they’ve also explored features leveraging CNNs, as well as classical machine learning models. One of the major challenges has been working with a bare metal tech stack while emphasizing scalability and modularity of DocuSign’s AI services.
John Snow Labs
Michael Chertushkin is a senior data scientist at John Snow Labs. He graduated from Ural Federal University, RadioTechnical Faculty in 2012 and worked as a software developer. In 2014 he decided to shift to data science, recognizing the growing interest towards machine learning. He has successfully completed several projects in this field and decided to get more fundamental skills, which led him to graduate from the Yandex Data School—the best educational center in Russia for preparing highly skilled professionals in data science.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires