Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY
Discover opportunities for applied AI
Organizations that successfully apply AI innovate and compete more effectively. How is AI transforming your business?
Be a part of the program—apply to speak by October 16.

Utilizing Rule-Based Text Extraction with Deep Learning Models for FDA Pharmacovigilance

Tom Sabo (SAS), Qais Hatim (Center for Drug Evaluation and Research, U.S. Food and Drug Administration)
4:05pm4:45pm Wednesday, April 17, 2019
Case Studies, Machine Learning
Location: Sutton South
Secondary topics:  AI case studies, Health and Medicine, Models and Methods, Text, Language, and Speech

Who is this presentation for?

Data Scientist, Data Analyst, Project Manager, Executive Sponsor



Prerequisite knowledge

Basic awareness of deep learning models and natural language processing.

What you'll learn

How to apply traditional rule-based text analytics extraction models to build and refine training data for deep learning (CRF/RNN) models. How to apply AI to an end-to-end use case enhancing pharmacovigilance for the Food and Drug Administration. Similarly, this methodology can be applied to other adverse event or complaint oriented scenarios for the government and commercial space.


The Medical Dictionary for Regulatory Activities (MedDRA) is a standardized set of medical terminology to facilitate sharing of regulatory information internationally for medical products, including drugs. MedDRA Preferred Terms (PTs) can be used to categorize adverse events (AE) during regulatory review and pharmacovigilance. Narratives from pre-and post-marketing AE reports are crucial to enhance regulatory review as they can be used to discover and correct any discrepancies in reported PT information. According to reviewers, such discrepancies are not rare. However, manual review of narratives is time consuming and a thorough review is often very challenging to achieve due to the massive volume of narratives.

In this presentation, we show how to apply AI methods to the assessment of narrative data to extract terminology related to adverse events. First, we will apply rule-based text analytics to develop conceptual structured metadata from unstructured narratives in an iterative process of refinement. Subsequently, these extractions feed deep learning models (CRF/RNN) to approximate continuous associations between the inputs (text data) and the targets (PTs). We propose to use these results to improve accuracy for automated MedDRA coding of AEs, including assessment of candidate terminology not covered by the MedDRA PTs. Ultimately, this integrated analysis will improve pharmacovigilance for the Food and Drug Administration.

Photo of Tom Sabo

Tom Sabo


Tom Sabo is a Principal Solutions Architect with SAS who, since 2005, has been immersed in text analytics and artificial intelligence applied to federal government challenges. He presents work internationally on diverse topics including modeling applied to government procurement, strategies to counter human trafficking, and using analytics to leverage and predict research trends. Sabo also served on a panel for the Institute of Medicine’s Standing Committee on Health Threats Resilience to inform DHS/OHA on social media strategies. He has a bachelor’s degree in cognitive science and a master’s in computer science, both from the University of Virginia.

Photo of Qais Hatim

Qais Hatim

Center for Drug Evaluation and Research, U.S. Food and Drug Administration

Qais received a dual Ph.D. degrees in operation research and industrial engineering from Pennsylvania State University/University Park in August 2015. In his role as computer scientist/statistician at FDA, he conducts research in statistical/operational modeling and computer science at Center of Drug Evaluation and Research (CDER)/ Office of Translational Science (OTS)/ Office of Computational Science (OCS) in the U.S. Food and Drug Administration (FDA). Specifically, he applies advanced statistical modeling and scientific computing techniques to computationally intensive tasks that are encountered in regulatory and scientific applications. For this purpose, he utilizes various statistical and operations research methodologies such as machine learning and data mining algorithms, natural language processing (NLP) techniques, Neural Networks procedures, and text analytics to extract meaning, patterns and hidden structures in structured and unstructured data; identifying the most feasible approaches to software/networking system design and development problems; consulting reviewers, fellow scientists, and regulations to analyze problems and recommend technology based solutions. He also prepares reports and manuscripts based on research findings and will present at scientific meetings as necessary. Moreover, he is an active member in several working groups across FDA such as the Modeling and Simulation Workgroup, INFORMED and HIVE.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)