Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML

Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM)
1:50pm2:30pm Wednesday, March 15, 2017
Data science & advanced analytics
Location: 210 C/G Level: Advanced
Secondary topics:  Deep learning, Healthcare
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Data scientists

Prerequisite knowledge

  • Basic knowledge of statistics and computer science

What you'll learn

  • Explore the automatic analysis of cancer proliferation rates
  • Understand how to implement deep learning with Apache SystemML


Breast cancer is a leading cause of death in women, affecting 12% of all women, with 30–40% of patients dying despite surgery. Survival rates increase with early detection, giving incentive for pathologists and the medical world at large to detect cancer more quickly. The primary driver of early detection is the analysis of cancer proliferation, the rate at which tumor cells grow.

Michael Dusenberry and Frederick Reiss share their experience using deep learning to predict tumor proliferation scores from high-resolution micrographs of tumor tissue. Scale, in terms of both data and model size, is key to achieving high accuracy in this domain. Michael and Frederick demonstrate how they use Apache SystemML’s model parallelism to scale the size of the model and Apache Spark’s data parallelism to scale the size of the training data. Michael and Frederick then walk you through how they implemented the training pipeline and present results from a seven-terabyte dataset.

Photo of Michael Dusenberry

Michael Dusenberry

IBM Spark Technology Center

Mike Dusenberry is an engineer at the IBM Spark Technology Center, where he is creating a deep learning library for SystemML and solving for performant deep learning at scale. Mike was on his way to an MD and a career as a physician in his home state of North Carolina when he teamed up with professors on a medical machine-learning research project. Two years later in San Francisco, Mike is contributing to Apache SystemML as a committer and researching medical applications for deep learning.

Photo of Frederick Reiss

Frederick Reiss


Fred Reiss is chief architect and one of the founding employees of the IBM Spark Technology Center in San Francisco. Previously, Fred worked for IBM Research Almaden for nine years, where he worked on the SystemML and SystemT projects as well as on the research prototype of DB2 with BLU Acceleration. He has over 25 peer-reviewed publications and six patents. Fred holds a PhD from UC Berkeley.