Mar 15–18, 2020

Data engineering at the Large Hadron Collider

Ben Galewsky (National Center for Supercomputing Applications), Gray Lindsey (Fermi National Accelerator Laboratory), Andrew Melo (Vanderbilt University)
5:05pm5:45pm Tuesday, March 17, 2020
Location: LL20A

Who is this presentation for?

Data engineers, data architects, developers




The experiments attached to Large Hadron Collider record approximately 1 billion collisions per second, generating staggering amounts of data. A C++ computational framework was developed in 1995 that continues to evolve and has led to countless significant scientific results. However, this framework is difficult to learn, and a new generation of physics researchers are demanding Python tooling based on general-purpose data science technologies. A new architecture, based on industry standard tools like Docker, Kubernetes, Kafka, and Spark is being deployed to labs around the world to meet the researchers’ needs in advance of the third run on the collider in 2026.

Ben Galewsky, Gray Lindsey, and Andrew Melo provide a background of the unique and amazing challenges of working in high-energy physics and describe how the application of some common open source technologies is helping researchers unlock some of the most challenging mysteries of the universe.

Join in to learn about the Large Hadron Collider and see how an architecture based on familiar technologies can be used to perform complex math at extremely large scales.

Prerequisite knowledge

  • Familiarity with Docker and Kubernetes

What you'll learn

  • Learn about an architecture based on common open source technologies that you can use to solve your problems involving the application of complex math on large datasets
  • Discover fascinating facts about the Large Hadron Collider
Photo of Ben Galewsky

Ben Galewsky

National Center for Supercomputing Applications

Ben Galewsky is a research programmer at the National Center for Supercomputing Applications at the University of Illinois. He’s an experienced data engineering consultant whose career has spanned high-frequency trading systems to global investment bank enterprise architecture to big data analytics for large consumer goods manufacturers. He’s a member of the Institute for Research and Innovation in Software for High Energy Physics, which funds his development of scalable systems for the Large Hadron Collider.

Gray Lindsey

Fermi National Accelerator Laboratory

Gray Lindsey is a staff scientist at Fermi National Accelerator Laboratory studying Higgs and electroweak physics. He’s focused on developing software and detectors to address the challenge of the high-luminosity upgrade for the Large Hadron Collider and the corresponding upgrade of the Compact Muon Solenoid (CMS) experiment. He’s developed a variety of pattern recognition techniques to demonstrate and help realize new detector systems to efficiently assemble physics data from upgrades to the CMS detector. He also leads the development to make the analysis of those data more efficient and scalable using modern big data technologies.

Andrew Melo

Vanderbilt University

Andrew Melo is a research professor of physics and a big data application developer at Vanderbilt University. He’s spent the last decade developing and implementing large-scale data workflows for the Large Hadron Collider. Recently his focus has been reimplementing these physics workflows with Apache Spark.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires