Sep 23–26, 2019
Please log in

Using Spark for crunching astronomical data on the LSST scale

Petar Zecevic (SV Group)
11:20am12:00pm Thursday, September 26, 2019
Location: 1E 07/08
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data engineers, software architects, big data, and astronomy enthusiasts




The slew of upcoming large-scale astronomical surveys promises exciting times for astronomy and computer science. One of the most important future surveys is the LSST. Its unique design and excellent location allow it to go both wide and deep at the same time, covering large regions of the sky and obtaining images of the faintest objects. LSST will produce one 3.2 giga-pixel image every 20 seconds every night for 10 years, resulting in the first “video” of the deep sky in history and (according to some estimates) about 80 PB of data. Furthermore, the worldwide scientific community will receive real-time alerts triggered by changes in the sky within 60 seconds of their detection.

Petar Zecevic explains how the LSST image processing pipeline uses acquired images to produce catalogs of astronomical objects. Together with colleagues from University of Washington, Petar built Astronomy Extensions for Spark (AXS), a system for processing and quickly cross-matching catalog data, based on Apache Spark. You’ll learn about its architecture and what’s behind its great performance.

Prerequisite knowledge

  • A basics understanding of distributed data processing and SQL

What you'll learn

  • Gain an introduction to LSST data-processing pipelines and details of a distributed spatial cross-matching approach
Photo of Petar Zecevic

Petar Zecevic

SV Group

Petar Zecevic is the chief technology officer of SV Group in Zagreb, Croatia, while pursuing his PhD at the University of Zagreb. He’s collaborating with the Astronomy Department at the University of Washington on building new methods for processing images and data from future astronomical surveys. Previously, he was a Java developer and worked as a software architect, team leader, and IBM software consultant. After switching to the exciting new field of big data technologies, he wrote Spark in Action (Manning, 2016) and primarily works on Apache Spark and big data projects.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires