Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

A billion stars in the Jupyter Notebook

Maarten Breddels (Maarten Breddels)
5:00pm–5:40pm Thursday, August 24, 2017
Usage and application
Location: Beekman/Sutton North Level: Intermediate
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Data scientists and anyone working with large data volumes

Prerequisite knowledge

  • Familiarity with ipywidgets

What you'll learn

  • Explore vaex and ipyvolume
  • Learn how to work with large tabular datasets and how to explore and visualize these in the Jupyter Notebook

Description

Large astronomical catalogues containing more than a billion stars require new methods to visualize and explore these large datasets. Data volumes of this size require different visualization techniques, since scatter plots quickly become too slow and meaningless due to overplotting. One solution to the performance and visualization issue uses binned statistics (e.g., histograms, density maps, and volume rendering in 3D). Maarten Breddels offers an overview of vaex, a Python library that enables calculating statistics for a billion samples per second on a regular n-dimensional grid, and ipyvolume, a library that enables volume and glyph rendering in Jupyter notebooks. Together, these libraries allow the interactive visualization and exploration of large, high-dimensional datasets in the Jupyter Notebook.

Vaex can process at least a billion samples per second, for instance to produce the mean of a quantity on a regular grid. This means statistics can be calculated for any mathematical expression on the data (NumPy style) and can be on the full dataset or subsets specified by queries or selections. However, no proper solution existed to interactively visualize higher-dimensional data in a notebook. This led to the development of ipyvolume, which can render 3D volumes and up to a million glyphs (scatter plots and quiver) in the Jupyter Notebook as a widget. With the browser as a platform and the release of ipywidgets 6.0, these 3D plots can also be embedded in static HTML files and renders on nbviewer, enabling you to share them with colleagues, render them on your tablet (particularly great for paperless offices), and use them for outreach, press release material, etc. Full-screen stereo rendering allows for a virtual reality experience using your phone and Google Cardboard, a minor investment compared to other VR head mountables, and overlaying 3D quiver plots on a 3D volume rendering allows exploring a 6D (or higher) space.

Photo of Maarten Breddels

Maarten Breddels

Maarten Breddels

Maarten Breddels is a astronomer, freelance developer, consultant, and data scientist working working mostly with Python, C++, and JavaScript in the Jupyter ecosystem. His expertise ranges from fast numerical computation and API design to 3D visualization. He holds a bachelor’s degree in ICT and both a master’s degree and PhD in astronomy.