Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Advanced data science, part 1: Data visualization in Jupyter using Matplotlib and Seaborn

Bruno Goncalves (Data For Science)
1:30pm–3:00pm Wednesday, August 22, 2018
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data scientists and analysts

Prerequisite knowledge

Attendants should have Python 3, Jupyter and keras installed. Notebooks will be made available through GitHub.

Materials or downloads needed in advance

  • A laptop with Jupyter, Python 3, Matplotlib, and Seaborn installed
  • Download the contents of the course GitHub repository

What you'll learn

  • Learn how to construct informative and appealing visualizations for your data

Description

As David McCandless famously said, “Information visualization is a form of knowledge compression.” In particular, it is a way of compressing information in a visual way that can be easily and correctly interpreted by the visual system in our brains.

Bruno Gonçalves offers an overview of the fundamental concepts and ideas behind human visual perception and explains how it informs scientific data visualization. To illustrate these concepts, Bruno shares practical examples using Matplotlib (the workhorse of visualization in Python) and Seaborn (a more recent package that builds on top of matplotlib and simplifies it for some of the most common use cases). You’ll learn how your eyes and visual cortex process colors and shapes and how to use it to your advantage.

Topics include:

  • Human perception
  • Color theory
  • Human vision
  • Fundamental principles of analytical design
  • Fundamental tools of visualization
  • Advantages and disadvantage of different chart types:
    • Scatter plots
    • Line charts
    • Bar charts
    • Bubble plots
    • Pie charts
    • Heatmaps
    • Choropleths
  • Matplotlib general concepts and philosophies
  • Seaborn fundamentals and how it improves on Matplotlib
Photo of Bruno Goncalves

Bruno Goncalves

Data For Science

Bruno Gonçalves is a chief data scientist at Data For Science, working at the intersection of data science and finance. Previously, he was a data science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since completing his PhD in the physics of complex systems in 2008, he’s been pursuing the use of data science and machine learning to study human behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme, he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of computational linguistics, information diffusion, behavioral change and epidemic spreading. In 2015, he was awarded the Complex Systems Society’s 2015 Junior Scientific Award for “outstanding contributions in complex systems science” and in 2018 was named a science fellow of the Institute for Scientific Interchange in Turin, Italy.