The Meaningful Use of Natural Language Processing in Healthcare

Josh Wills (Cloudera), Kenneth Kolenik (Nuance Healthcare), Jake Marcus (Practice Fusion), Jacob Miller (Explorys), Nigam Shah (Stanford School of Medicine)
Sponsored Sessions
Location: Yosemite C
Average rating: ****.
(4.00, 1 rating)

Even with the advent of sophisticated EHR systems, the vast majority of information about patient care is still recorded as free text, and this seems unlikely to change in the near future. Much of the recent interest in natural language processing (NLP) techniques in healthcare derives from need to map enormous amounts of textual information on to the new, far more detailed ICD-10 coding standard without dramatically increasing the costs imposed on healthcare practitioners. Additionally, extracting structured features from the text is a necessary prerequisite for the advanced analytics and predictive models that we would like to deploy in order to improve clinical outcomes and patient care. Even so, natural language processing in healthcare remains extremely challenging, based on the sheer volume of specialized medical terminology and clinical care procedures.

This panel will discuss the path towards the meaningful use of natural language processing to improve patient care and clinical outcomes, both intrinsically and as a necessary input for predictive analytics. We are gathering a diverse group of data scientists with experience in EHRs, clinical data analysis, and natural language processing to discuss:

  1. The present state of natural language processing in healthcare, from promising use cases to its current limitations,
  2. The right path forward for natural language technology in healthcare, whether via coding standards, researching patient outcomes, or clinical decision support,
  3. The relationship between advanced in NLP and tools for large-scale data analysis like Hadoop and HBase,
  4. NLP and the sociology of medicine: ways that we can create a virtuous cycles of improvements to the technology and real benefits for practitioners

This session is sponsored by Cloudera

Photo of Josh Wills

Josh Wills


Josh Wills is director of data science at Cloudera, where he works with customers and engineers to develop Hadoop-based solutions across a wide range of industries. Prior to joining Cloudera, Josh was at Google where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+. He earned his bachelor’s degree in mathematics from Duke University and his master’s in operations research from the University of Texas-Austin.

Kenneth Kolenik

Nuance Healthcare

Ken Kolenik is a Senior Business Program Manager for Nuance Communications, Inc., responsible for product management of Nuance’s Clinical Language Understanding technology and the application of that technology with the University of Pittsburgh Medical Center’s (UPMC) Technology Development Center and other strategic clinical partners. Over the past 16 years, Ken has defined and commercialized technology to increase the value of data and improve business process.

Prior to Nuance, Ken was Director of Product Management at invivodata, Inc., where he developed solutions for the pharmaceutical industry to bring new drugs to market faster and with fewer issues through improved accuracy and validity of clinical trial data collected from patients and clinicians. Ken also has extensive experience with speech applications and using speech interfaces and data to improve business. As a product manager at Vocollect, Inc., Ken defined and delivered improved accuracy and productivity for industrial and mobile workers through voice-directed work applications. Ken has used his expertise with technology, data, product management, and product marketing to bring products to commercialize products across various industries throughout the world.

Ken holds a BA in Journalism from Penn State University and an MBA from the Duquesne University School of Business.

Photo of Jake Marcus

Jake Marcus

Practice Fusion

Jake Marcus was the first employee at Practice Fusion tasked with analyzing the company’s clinical data, the electronic health records of millions of patients. He uses this data to detect disease outbreaks, identify dangerous drug interactions and compare the effectiveness of competing treatments. Before Practice Fusion, Jake was a fellow at the Institute for Health Metrics and Evaluation, where he developed methods to estimate child and adult mortality in 187 countries over the past 40 years. He has been published multiple times in The Lancet, PLoS Medicine and other prestigious medical journals. His work has also appeared in the popular press, including the New York Times, The Economist, and the Washington Post. Jake is a passionate advocate for using data to improve health. He has championed the cause through the New Republic, where he has written on healthcare, through Dr. Oz’s “15 Minute Physicals” and regularly on He graduated from Yale University with a BA in mathematics.

Photo of Jacob Miller

Jacob Miller


Jacob Miller is a Data Scientist at Explorys who is passionate about using data to improve the quality of health care. The Informatics group in which he works develops Hadoop-based, analytic, solutions to difficult BIG data problems. Recently, he has been working on predicting costs for episodes of care, reducing error rates in patient-matching across diverse data sources, and creating data visualizations. He earned his MS in Operations Research from The Ohio State University.

Nigam Shah

Stanford School of Medicine

Dr. Nigam H. Shah is an Assistant Professor of Medicine (Biomedical Informatics) at the Stanford School of Medicine. His group builds ontology-based applications to improve search, integration and use of unstructured biomedical information. Recently, his group has shown that by mining the unstructured clinical notes for patterns of mentions of disease conditions, drug names and the temporal ordering of the drugs and diseases, is it possible to detect the known associations between drugs and their adverse effects—on average 2 years before their respective safety alerts are issued. He teaches on the topics of how to use biomedical ontologies for data mining as well as on current trends & future directions in data-driven medicine. ( and

Dr. Shah has been an invited speaker at several international conferences on the topic of applying ontologies in health and life sciences. He co-chairs the Bio-Ontologies meeting at the Intelligent Systems in Molecular Biology conference since 2008 and chaired the American Medical Informatics Association’s Summit on Translational Bioinformatics in 2012. He holds an MBBS from Baroda Medical College, India, a PhD from Penn State University and completed postdoctoral training at Stanford University.

Comments on this page are now closed.


Harry Mark
08/04/2013 4:40pm PDT

Running health care organizations comes with a lot of financial, legal, and compliance risks and the providers have to be aware of each of these different aspects to be in business. There are some service providers, such as Health Security Solutions that can help organizations in these endeavors. I think when these basic objectives are in order, they can move to natural language processing to help with patient care and clinical outcomes.


For information on exhibition and sponsorship opportunities at the conference, contact Sharon Pierce at (203) 304-9476 or

For information on trade opportunities with O'Reilly conferences contact mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata Rx contacts