14–17 Oct 2019

NLP for healthcare: Feature engineering and model diagnostics

Manas R Kar (Episource)
16:0016:40 Thursday, 17 October 2019
Location: Buckingham Room - Palace Suite

Who is this presentation for?

  • Data scientists, data engineers, healthcare professionals, and VPs




NLP for clinical text is an extremely challenging problem. It suffers from the bane of nonconformity of language, lack of appropriate grammar, a wild range of mentions of the same entity, limited disambiguation, and requirement of highly specific domain rules.

While building NLP solutions with high precision and recall at Episource, the company came to a conclusion: healthcare NLP requires a fair amount of feature engineering, with a special focus on incorporating domain-specific features. It’s also pertinent to handle negations and disease mentions in line with what a human would decipher. For example, it’s important that a state-of-the-art system be able to decipher the difference in context of “diabetes” for “patient has diabetes” versus “patient must follow a healthy diet to avoid diabetes.” While both have a mention of a disease, the “real” mention is in only one of them.

Manas Ranjan Kar walks you through the challenges for NLP in a typical clinical text domain and explores the broad techniques that seems to work pretty well for feature engineering for such problem statements. You’ll take a dive into performing automated model diagnostics for an NLP model to ensure that the domain-specific feature engineering was able to improve model skill.

Prerequisite knowledge

  • A basic understanding of NLP

What you'll learn

  • Understand NLP and how it’s different for healthcare-based problem statements
  • Learn about domain-specific feature engineering for clinical text, model evaluation and setting domain-level metrics for evaluation, and automated model diagnostics to improve feature relevance
Photo of Manas R Kar

Manas R Kar


Manas Ranjan Kar is a Associate Vice President at US healthcare company Episource, where he leads the NLP and data science practice, works on semantic technologies and computational linguistics (NLP), builds algorithms and machine learning models, researches data science journals, and architects secure product backends in the cloud. He’s architected multiple commercial NLP solutions in the area of healthcare, food and beverages, finance, and retail. Manas is deeply involved in functionally architecting large-scale business process automation and deep insights from structured and unstructured data using NLP and ML. He’s contributed to NLP libraries including gensim and Conceptnet 5 and blogs regularly about NLP on forums like Data Science Central, LinkedIn, and his blog Unlock Text. Manas speaks regularly about NLP and text analytics at conferences and meetups, such as PyCon India and PyData, has taught hands-on sessions at IIM Lucknow and MDI Gurgaon, and has mentored students from schools including ISB Hyderabad, BITS Pilani, and the Madras School of Economics. When bored, he falls back on Asimov to lead him into an alternate reality.

  • Intel AI
  • O'Reilly
  • Amazon Web Services
  • IBM Watson
  • Dell Technologies
  • Hewlett Packard Enterprise
  • AXA

Contact us


For conference registration information and customer service


For more information on community discounts and trade opportunities with O’Reilly conferences


For information on exhibiting or sponsoring a conference


For media/analyst press inquires