Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Securely building deep learning models for digital health data

Josh Patterson (Skymind), Vartika Singh (Cloudera), Dave Kale (Skymind), Tom Hanlon (Functional Media)
1:30pm5:00pm Tuesday, September 26, 2017
Artificial Intelligence
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Healthcare
Average rating: **...
(2.00, 1 rating)

Who is this presentation for?

  • Data scientists, engineers, and healthcare experts

Prerequisite knowledge

  • A basic understanding of Spark and deep learning

Materials or downloads needed in advance

  • A laptop
  • A GitHub account

What you'll learn

  • Learn how to build and successfully train multitask neural networks to predict multiple clinical outcomes simultaneously from publicly available digital health data


Applying deep learning in nontraditional data domains, such as electronic health record (EHR) data and medical imagery, presents a variety of challenges, including the friction that practitioners experience when transitioning from traditional data science tasks and tools to training complex neural network architectures. The areas in which deep learning reigns supreme, such as computer vision and natural language, often require less data exploration and utilize well-known preprocessing (e.g., whitening or one-hot encoding). The jump from raw data to model development is short and practitioners can easily bridge the gap using mature, off-the-shelf tools. This makes it straightforward to build a reusable experimental pipeline that feeds into, e.g, a distributed environment designed to optimize performance over a large number of possible model architectures with little or no manual intervention.

In contrast, in data domains like healthcare, there is wide gulf between initial data exploration and downstream model development. Health data requires significantly more upfront analysis in order to determine properties like data types and distributions and identify outliers and missing values. This data likewise requires the creation of more complex and typically ad hoc preprocessing pipelines. This is best performed in an interactive environment in which practitioners can iteratively ask and answer data-driven questions, quickly view their results alongside their code, make plots and graphs, and record inline notes. However, the transition from this sort of interactive environment to developing and training of large-scale neural network models is often bumpy, requiring the practitioner to switch development environments and refactor their piecemeal analyses into a pipeline that can be connected to an offline model training framework.

Cloudera Workbench provides a practitioner with a smooth transition from interactive data exploration to building pipelines to eventual execution of large-scale deep learning leveraging a traditional Hadoop cluster. The Workbench provides a secure, isolated environment for model development and collaboration and can help accelerate data science from exploration to production.

Josh Patterson, Vartika Singh, David Kale, and Tom Hanlon walk you through interactively developing and training deep neural networks to analyze digital health data using the Cloudera Workbench and Deeplearning4j (DL4J). You’ll learn how to use the Workbench to rapidly explore real-world clinical data, build data-preparation pipelines, and launch training of neural networks.

Photo of Josh Patterson

Josh Patterson


Josh Patterson is the director of field engineering for Skymind. Previously, Josh ran a big data consultancy, worked as a principal solutions architect at Cloudera, and was an engineer at the Tennessee Valley Authority, where he was responsible for bringing Hadoop into the smart grid during his involvement in the openPDC project. Josh is a cofounder of the DL4J open source deep learning project and is a coauthor of Deep Learning: A Practitioner’s Approach. Josh has over 15 years’ experience in software development and continues to contribute to projects such as DL4J, Canova, Apache Mahout, Metronome, IterativeReduce, openPDC, and JMotif. Josh holds a master’s degree in computer science from the University of Tennessee at Chattanooga, where he did research in mesh networks and social insect swarm algorithms.

Photo of Vartika Singh

Vartika Singh


Vartika Singh is a solutions architect at Cloudera with over 12 years of experience applying machine learning techniques to big data problems.

Photo of Dave Kale

Dave Kale


David Kale is a deep learning engineer at Skymind and a PhD candidate in computer science at the University of Southern California, where he is advised by Greg Ver Steeg of the USC Information Sciences Institute. His research uses machine learning to extract insights from digital data in high-impact domains, such as healthcare, and he collaborates with researchers from Stanford Center for Biomedical Informatics Research and the YerevaNN Research Lab. Recently, David pioneered the application of deep learning to modern electronic health records data. At Skymind, he works with clients and partners to develop and deploy deep learning solutions for real world problems. David co-organizes the Machine Learning for Healthcare Conference (MLHC) and has served as a judge in several XPRIZE competitions, including the upcoming IBM Watson AI XPRIZE. He is the recipient of the Alfred E. Mann Innovation in Engineering Fellowship.

Photo of Tom Hanlon

Tom Hanlon

Functional Media

Tom Hanlon is a senior instructor at Functional Media, where he delivers courses on the wonders of the Hadoop ecosystem. Before beginning his relationship with Hadoop and large distributed data, he had a happy and lengthy relationship with MySQL with a focus on web operations. He has been a trainer for MySQL, Sun, and Percona.