Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo
Who is this presentation for?Big Data Analytics & AI Architects, Data Engineers, Data Scientists, Enterprise Analytics & AI decision makers
In this session, you will learn how CERN easily applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for High Energy Physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters. Technical details and development learnings will be shared using an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. The classifier has demonstrated very good performance figures for efficiency, while also reducing the false positive rate compared to the existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives. This is part of CERN’s research on applying Deep Learning and Analytics using open source and industry standard technologies as an alternative to the existing customized rule based methods. We show how we could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying Analytics and AI on Spark with easy to use APIs and development interfaces seamlessly integrated with Big Data Platforms.
Prerequisite knowledgeBasic knowledge of Apache Spark and deep learning concepts
What you'll learn
Sajan Govindan is a Solution Architect in the Data Analytics Technologies team in Intel focusing on open source technologies for Big Data Analytics and AI solutions. Sajan has been with Intel for more than eighteen years with many years of experience and expertise in building Analytics and AI solutions working through the advancements in Hadoop and Spark ecosystem, Machine Learning and Deep Learning frameworks, in various industry verticals and domains
Luca is a data engineer at CERN with the Hadoop, Spark, Streaming and database services. Luca has 18+ years of experience with architecting, deploying and supporting enterprise-level database and data services with a special interest in methods and tools for performance troubleshooting. Luca is involved in developing and supporting solutions for data analytics and ML for the CERN community, including LHC experiments, the accelerator sector and CERN IT and enjoys taking part and sharing results with the data community at large.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts