Data engineering at the Large Hadron Collider
Who is this presentation for?Data engineers, data architects, developers
The experiments attached to Large Hadron Collider record approximately 1 billion collisions per second, generating staggering amounts of data. A C++ computational framework was developed in 1995 that continues to evolve and has led to countless significant scientific results. However, this framework is difficult to learn, and a new generation of physics researchers are demanding Python tooling based on general-purpose data science technologies. A new architecture, based on industry standard tools like Docker, Kubernetes, Kafka, and Spark is being deployed to labs around the world to meet the researchers’ needs in advance of the third run on the collider in 2026.
Ben Galewsky, Gray Lindsey, and Andrew Melo provide a background of the unique and amazing challenges of working in high-energy physics and describe how the application of some common open source technologies is helping researchers unlock some of the most challenging mysteries of the universe.
Join in to learn about the Large Hadron Collider and see how an architecture based on familiar technologies can be used to perform complex math at extremely large scales.
- Familiarity with Docker and Kubernetes
What you'll learn
- Learn about an architecture based on common open source technologies that you can use to solve your problems involving the application of complex math on large datasets
- Discover fascinating facts about the Large Hadron Collider
National Center for Supercomputing Applications
Ben Galewsky is a research programmer at the National Center for Supercomputing Applications at the University of Illinois. He’s an experienced data engineering consultant whose career has spanned high-frequency trading systems to global investment bank enterprise architecture to big data analytics for large consumer goods manufacturers. He’s a member of the Institute for Research and Innovation in Software for High Energy Physics, which funds his development of scalable systems for the Large Hadron Collider.
Fermi National Accelerator Laboratory
Gray Lindsey is a staff scientist at Fermi National Accelerator Laboratory studying Higgs and electroweak physics. He’s focused on developing software and detectors to address the challenge of the high-luminosity upgrade for the Large Hadron Collider and the corresponding upgrade of the Compact Muon Solenoid (CMS) experiment. He’s developed a variety of pattern recognition techniques to demonstrate and help realize new detector systems to efficiently assemble physics data from upgrades to the CMS detector. He also leads the development to make the analysis of those data more efficient and scalable using modern big data technologies.
Andrew Melo is a research professor of physics and a big data application developer at Vanderbilt University. He’s spent the last decade developing and implementing large-scale data workflows for the Large Hadron Collider. Recently his focus has been reimplementing these physics workflows with Apache Spark.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires