Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

How the largest US healthcare dataset in Hadoop enables patient-level analytics in near real time

Navdeep Alam (IMS Health)
11:20am–12:00pm Wednesday, 09/28/2016
Hadoop use cases
Location: 3D 08 Level: Intermediate
Average rating: ****.
(4.50, 12 ratings)

Prerequisite knowledge

  • A working knowledge of the Hadoop stack (specifically HBase)
  • Basic familiarity with Spark
  • What you'll learn

  • Understand how using the Hadoop stack, specifically HBase and Spark, can help answer complex longitudinal questions about patients and other healthcare questions in almost near real time
  • Description

    As healthcare data becomes more digitized, the opportunity to leverage electronic medical records, prescription data, medical billings, hospital, and other healthcare datasets to help improve health outcomes and lower the cost of care for patients in near real time is becoming a possibility. However, processing terabytes and petabytes of de-identified healthcare data requires the application of complex and ever-changing business rules. This impacts the ability to generate near-real-time insights and conduct research studies that could potentially influence how patients are treated.

    Today, the analysis of databases of this magnitude can take days or even weeks of processing; to be more effective for improving patient care, researchers need to be able to run processes on demand, returning result sets instantaneously. Navdeep Alam shares his experience at IMS Health in realizing this opportunity to influence patient health outcomes in minutes to seconds and reviews current and emerging technologies in the marketplace that handle working with unbounded, de-identified patient datasets in the billions of rows in an efficient and scalable way.

    Photo of Navdeep Alam

    Navdeep Alam

    IMS Health

    Navdeep (Nav) Alam brings more than 15 years of experience in software engineering, databases, data warehousing, analytics, architecture, and development to his role as the director of global data warehousing at IMS Health, where he is charged with managing the global data warehousing organization as a center of excellence and defining and executing its future roadmap, which includes next-generation massive parallel processing (MPP), low-latency data warehousing systems. Nav is also a graduate teaching assistant at Boston University, where he assists in teaching graduate students on enterprise computing, advanced databases, data mining, and business intelligence.

    Previously, Nav was the director of analytics and prediction for Empirix, where he led a global team in the architecture and development of its next-generation analytics platform, IntelliSight; director of data architecture for Mzinga’s social intelligence applications as part of its OmniSocial SaaS platform; principal software engineer for KnowledgePlanet, Mzinga’s predecessor, where he was the principal architect and developer of Firefly Simulation Developer; in the Information Technology and Application Support group at Calgary’s Nova Chemical Research and Technology Center, where he provided Y2K support and developed an interactive laboratory information management system web tutorial application; and a Unix administrator managing the Oracle data systems for Syncrude Canada and EI Processing, building filtering algorithms to scrub noise for seismic data processing. Nav holds an MS in computer science from Boston University and a bachelor’s degree in computer science from the University of Calgary.