Sep 23–26, 2019
Please log in

Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo

Sajan Govindan (Intel)
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 06/07
Secondary topics:  Deep Learning

Who is this presentation for?

  • Big data analytics and AI architects, data engineers, data scientists, enterprise analytics, and AI decision makers




Sajan Govindan dives into how CERN applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for high energy physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters. Sajan outlines technical details and development insights with an example of topology classification to improve real-time event selection at the Large Hadron Collider (LHC). The classifier demonstrated very good performance figures for efficiency while also reducing the false-positive rate compared to existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where it could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives.

This is part of CERN’s research on applying deep learning and analytics using open source and industry-standard technologies as an alternative to the existing customized rule-based methods. Sajan explores how CERN could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying analytics and AI on Spark with easy-to-use APIs and development interfaces seamlessly integrated with big data platforms.

Prerequisite knowledge

  • A basic understanding of Apache Spark and deep learning concepts

What you'll learn

  • Discover how to simplify development and deployment of deep learning solutions on big data platforms at scale using open source technologies and how scientific computing applies industry-standard deep learning solutions in their data pipelines
  • Learn about the deep learning frameworks BigDL and Analytics Zoo
Photo of Sajan Govindan

Sajan Govindan


Sajan Govindan is a solutions architect on the data analytics technologies team at Intel, focusing on open source technologies for big data analytics and AI solutions. Sajan has been with Intel for more than eighteen years, with many years of experience and expertise in building analytics and AI solutions, working through the advancements in the Hadoop and Spark ecosystem and machine learning and deep learning frameworks in various industry verticals and domains.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires