Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

A billion stars in the Jupyter Notebook

Maarten Breddels (Kapteyn Astronomical Institute, University of Groningen)
5:00pm–5:40pm Thursday, August 24, 2017
Usage and application
Location: Beekman/Sutton North Level: Intermediate
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Data scientists and anyone working with large data volumes

Prerequisite knowledge

  • Familiarity with ipywidgets

What you'll learn

  • Explore vaex and ipyvolume
  • Learn how to work with large tabular datasets and how to explore and visualize these in the Jupyter Notebook

Description

Large astronomical catalogues containing more than a billion stars require new methods to visualize and explore these large datasets. Data volumes of this size require different visualization techniques, since scatter plots quickly become too slow and meaningless due to overplotting. One solution to the performance and visualization issue uses binned statistics (e.g., histograms, density maps, and volume rendering in 3D). Maarten Breddels offers an overview of vaex, a Python library that enables calculating statistics for a billion samples per second on a regular n-dimensional grid, and ipyvolume, a library that enables volume and glyph rendering in Jupyter notebooks. Together, these libraries allow the interactive visualization and exploration of large, high-dimensional datasets in the Jupyter Notebook.

Vaex can process at least a billion samples per second, for instance to produce the mean of a quantity on a regular grid. This means statistics can be calculated for any mathematical expression on the data (NumPy style) and can be on the full dataset or subsets specified by queries or selections. However, no proper solution existed to interactively visualize higher-dimensional data in a notebook. This led to the development of ipyvolume, which can render 3D volumes and up to a million glyphs (scatter plots and quiver) in the Jupyter Notebook as a widget. With the browser as a platform and the release of ipywidgets 6.0, these 3D plots can also be embedded in static HTML files and renders on nbviewer, enabling you to share them with colleagues, render them on your tablet (particularly great for paperless offices), and use them for outreach, press release material, etc. Full-screen stereo rendering allows for a virtual reality experience using your phone and Google Cardboard, a minor investment compared to other VR head mountables, and overlaying 3D quiver plots on a 3D volume rendering allows exploring a 6D (or higher) space.

Photo of Maarten Breddels

Maarten Breddels

Kapteyn Astronomical Institute, University of Groningen

Maarten Breddels is a postdoctoral researcher at the Kapteyn Astronomical Institute at the University of Groningen (RUG), Netherlands, where he works for the Gaia mission, combining astronomy and IT to enable visualization and exploration of the large dataset this satellite will yield. Maarten has experience with low-level languages, such as Assembly and C, and higher-level languages, including C++, Java, and Python. He holds a bachelor’s degree in information technology and a bachelor’s degree, master’s degree, and PhD in astronomy, where his research focused on the field of galactic dynamics.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)