Sep 23–26, 2019

Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo

Sajan Govindan (Intel), Luca Canali (CERN)
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 06/07
Secondary topics:  Deep Learning

Who is this presentation for?

Big Data Analytics & AI Architects, Data Engineers, Data Scientists, Enterprise Analytics & AI decision makers




In this session, you will learn how CERN easily applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for High Energy Physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters. Technical details and development learnings will be shared using an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. The classifier has demonstrated very good performance figures for efficiency, while also reducing the false positive rate compared to the existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives. This is part of CERN’s research on applying Deep Learning and Analytics using open source and industry standard technologies as an alternative to the existing customized rule based methods. We show how we could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying Analytics and AI on Spark with easy to use APIs and development interfaces seamlessly integrated with Big Data Platforms.

Prerequisite knowledge

Basic knowledge of Apache Spark and deep learning concepts

What you'll learn

How to simplify development and deployment Deep Learning solutions on Big Data platforms at scale using open source technologies, and how scientific computing is applying industry standard deep learning solutions in their data pipelines. You will learn about deep learning frameworks - BigDL and Analytics Zoo
Photo of Sajan Govindan

Sajan Govindan


Sajan Govindan is a Solution Architect in the Data Analytics Technologies team in Intel focusing on open source technologies for Big Data Analytics and AI solutions. Sajan has been with Intel for more than eighteen years with many years of experience and expertise in building Analytics and AI solutions working through the advancements in Hadoop and Spark ecosystem, Machine Learning and Deep Learning frameworks, in various industry verticals and domains

Photo of Luca Canali

Luca Canali


Luca is a data engineer at CERN with the Hadoop, Spark, Streaming and database services. Luca has 18+ years of experience with architecting, deploying and supporting enterprise-level database and data services with a special interest in methods and tools for performance troubleshooting. Luca is involved in developing and supporting solutions for data analytics and ML for the CERN community, including LHC experiments, the accelerator sector and CERN IT and enjoys taking part and sharing results with the data community at large.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts