Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo

Sajan Govindan (Intel)

3:45pm–4:25pm Thursday, September 26, 2019

Location: 1A 06/07

Data Science, Machine Learning, & AI

Secondary topics: Deep Learning

Download slides (PDF)

Who is this presentation for?

Big data analytics and AI architects, data engineers, data scientists, enterprise analytics, and AI decision makers

Level

Beginner

Description

Sajan Govindan dives into how CERN applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for high energy physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters. Sajan outlines technical details and development insights with an example of topology classification to improve real-time event selection at the Large Hadron Collider (LHC). The classifier demonstrated very good performance figures for efficiency while also reducing the false-positive rate compared to existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where it could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives.

This is part of CERN’s research on applying deep learning and analytics using open source and industry-standard technologies as an alternative to the existing customized rule-based methods. Sajan explores how CERN could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying analytics and AI on Spark with easy-to-use APIs and development interfaces seamlessly integrated with big data platforms.

Prerequisite knowledge

A basic understanding of Apache Spark and deep learning concepts

What you'll learn

Discover how to simplify development and deployment of deep learning solutions on big data platforms at scale using open source technologies and how scientific computing applies industry-standard deep learning solutions in their data pipelines
Learn about the deep learning frameworks BigDL and Analytics Zoo

Sajan Govindan

Intel

Sajan Govindan is a solutions architect on the data analytics technologies team at Intel, focusing on open source technologies for big data analytics and AI solutions. Sajan has been with Intel for more than eighteen years, with many years of experience and expertise in building analytics and AI solutions, working through the advancements in the Hadoop and Spark ecosystem and machine learning and deep learning frameworks in various industry verticals and domains.