Extensive research has been conducted at the intersection of machine learning and healthcare. With an anticipated 48% annual growth in healthcare data, building scalable healthcare solutions is more crucial now than ever before. Fortunately, there has been an equivalent surge in the algorithms, software packages, optimization techniques, and hardware available for obtaining insights on healthcare data.
However, despite these advances, minimal research has been conducted to understand the challenges and considerations associated with transforming research prototypes into real-world healthcare solutions. Often machine learning research is conducted in silos: researchers build prototypes, which are subsequently picked up by software developers who develop scalable solutions for the prototype. There is a gap in the understanding of trade-offs to consider to transition from a research prototype to a deployable healthcare solution.
Rachita Chandra outlines challenges and considerations for transforming a research prototype built for a single machine to a deployable healthcare solution that leverages Spark in a distributed environment. The original prototype, which tackled prediction of healthcare costs, worked well for a dataset of 5 million users in a nondistributed environment. It utilized several Python data science libraries and machine learning models. However, as the dataset grew larger (> 1 TB), computational resources became the bottleneck, and the need to adopt Spark became apparent. Since the research prototype was in its mature phases, porting components of the existing pipeline to leverage Spark was more effective than building a Spark codebase from scratch. The deployable solution is an end-to-end multitenant enterprise application comprising of several components: user authentication, request handling, data transformations, quality checks, analytics, machine learning modules, a visualization interface, and error handling.
Rachita Chandra is a solutions architect at IBM Watson Health, where she brings together end-to-end machine learning solutions in healthcare. She has experience implementing large-scale, distributed machine learning algorithms. Rachita holds both a master’s and bachelor’s degree in electrical and computer engineering from Carnegie Mellon.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org