Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Cataloging the visible universe through Bayesian inference at petascale in Julia

Keno Fischer (Julia Computing)
4:20pm5:00pm Thursday, March 8, 2018

Who is this presentation for?

  • Data scientists, developers, and CTOs

What you'll learn

  • Explore Celeste, a Bayesian variational inference implementation in Julia that uses machine learning to catalog astronomical objects to derive a catalog from multiterabyte size astronomical image datasets

Description

Julia is rapidly becoming a popular language at the forefront of scientific discovery. Keno Fischer explores one of the most ambitious use cases for Julia: using machine learning to catalog astronomical objects to derive a catalog from multiterabyte size astronomical image datasets. This work was a collaboration between MIT, UC Berkeley, LBNL, and Julia Computing.

Astronomical catalogs derived from wide-field imaging surveys are the quintessential tool for understanding the universe. Keno offers an overview of Celeste, a Bayesian variational inference code used to construct an astronomical catalog from 55 TB of SDSS imaging data. Celeste is written entirely in the high-productivity programming language Julia. Using over 1.3 million threads on 650,000 Intel Xeon Phi cores of the Cori Phase II supercomputer, Celeste achieves a peak rate of 1.54 DP PFLOP/s. Celeste is able to jointly optimize parameters for 188M stars and galaxies, loading and processing 178 TB across 8192 nodes in 14.6 minutes. To achieve this, Celeste exploits parallelism at multiple levels—cluster, node, and thread—and accelerates I/O through Cori’s Burst Buffer. Julia’s native performance enables Celeste to employ high-level constructs without resorting to handwritten or generated low-level code (C, C++, Fortran, etc.) and yet achieve peta-scale performance.

Keno discusses the techniques and methodologies used to achieve this level of performance (roughly 40%–60% on the two topics). One key design concern for Julia is that regular developers should be able to take their code and run it anywhere from a mobile phone to the Cray supercomputer without having to fundamentally change the tools they employ to accomplish this task. As such, Celeste proves Julia’s scalability to extremely large data science problems and provides a roadmap for others to accomplish the same.

Photo of Keno Fischer

Keno Fischer

Julia Computing

Keno Fischer is CTO of Julia Computing, where he leads the company’s efforts in the compiler and developer tools space. Keno has been a core developer of the Julia Language for more than five years. Keno holds an AM in physics and an AB in physics, mathematics, and computer science from Harvard University.