Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Interactive data visualization with Lightning: Using d3, Seaborn, and R

Matthew Conlen (FiveThirtyEight)
9:00am–12:30pm Tuesday, 12/01/2015
Data Science and Advanced Analytics
Location: 334 Level: Intermediate
Average rating: **...
(2.44, 16 ratings)

Prerequisite Knowledge

Attendees should have basic familiarity with Python or Scala, and should have some experience with large scale data analysis techniques (e.g. map reduce, Hadoop, Spark).

Description

A Tutorial on Data Visualization with Lightning

Basic data visualization techniques in interactive notebooks – I cover how to set up an interactive notebook environment with Jupyter, and go through some basic examples using Python with popular visualization libraries such as matplotlib and seaborn. I also show how to use custom kernels to integrate languages other than Python into the notebook environment (for example, Scala and R).

Interactive visualization with Lightning – I introduce the Lightning data visualization server and show how to include it in the notebook environment that was set up in the first portion of the session. I go through a wide range of examples showing the power of the library.

Using these tools with large scale analysis libraries – What is visualization without analysis?! In this step I integrate Spark, a popular engine for driving large-scale interactive data analysis, and go through examples using Spark in conjunction with the above data visualization tools.

Closing the feedback loop – Users of Lightning can set up visualizations to trigger callbacks on specific events. For example, a user could highlight a certain portion of an image and then run subsequent analysis on data underlying that region. I show how to set this up and use it to automatically run Spark jobs based on user interaction on data visualizations.

Custom Graphics – I will show how attendees can create their own domain-specific visualizations using JavaScript, including libraries like d3. We will then cover how to integrate these into the data pipeline and use them in a notebook environment.

Matthew Conlen

FiveThirtyEight

Matthew Conlen is a software engineer and information designer in New York. He is a partner at the New York Data Company, and works as the senior developer for Rhizome and computational journalist at FiveThirtyEight. Matthew collaborates with researchers from HHMI Janelia on the open source Lightning data visualization server. He graduated from the University of Michigan with degrees in computer science and applied mathematics.

Comments on this page are now closed.

Comments

Picture of Amit Kapoor
Amit Kapoor
12/01/2015 9:50pm +08

Matt – Good session and nice to see integration to create interactive visualisation easily in the notebook.

Picture of Kenny Lee
Kenny Lee
11/26/2015 6:26am +08

Hi Matthew,

Can you share some details about the data and they will be imported?

I am guessing it would fall under the responsibility of the Spark jobs? Thanks!

Cheers,
Kenny