Breast cancer is a leading cause of death in women, affecting 12% of all women, with 30–40% of patients dying despite surgery. Survival rates increase with early detection, giving incentive for pathologists and the medical world at large to detect cancer more quickly. The primary driver of early detection is the analysis of cancer proliferation, the rate at which tumor cells grow.
Michael Dusenberry and Frederick Reiss share their experience using deep learning to predict tumor proliferation scores from high-resolution micrographs of tumor tissue. Scale, in terms of both data and model size, is key to achieving high accuracy in this domain. Michael and Frederick demonstrate how they use Apache SystemML’s model parallelism to scale the size of the model and Apache Spark’s data parallelism to scale the size of the training data. Michael and Frederick then walk you through how they implemented the training pipeline and present results from a seven-terabyte dataset.
Mike Dusenberry is an engineer at the IBM Spark Technology Center, where he is creating a deep learning library for SystemML and solving for performant deep learning at scale. Mike was on his way to an MD and a career as a physician in his home state of North Carolina when he teamed up with professors on a medical machine-learning research project. Two years later in San Francisco, Mike is contributing to Apache SystemML as a committer and researching medical applications for deep learning.
Fred Reiss is chief architect and one of the founding employees of the IBM Spark Technology Center in San Francisco. Previously, Fred worked for IBM Research Almaden for nine years, where he worked on the SystemML and SystemT projects as well as on the research prototype of DB2 with BLU Acceleration. He has over 25 peer-reviewed publications and six patents. Fred holds a PhD from UC Berkeley.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.