Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

PyData at Strata

Fernando Perez (Lawrence Berkeley National Laboratory and UC Berkeley), Brian Granger (Cal Poly San Luis Obispo), Andy Terrel (Bold Metrics), Peter Wang (Continuum Analytics), Jake Vanderplas (eScience Institute, University of Washington), Olivier Grisel (Inria & scikit-learn), Travis Oliphant (Continuum Analytics, Inc.), William McKinney (Cloudera), Trent Nelson (Continuum Analytics), Kayur Patel (Google), Kester Tong (Google)
9:00am–5:00pm Wednesday, 10/15/2014
Data Science
Location: 1 E12/1 E13
Average rating: ****.
(4.43, 14 ratings)

PyDataPython has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including iPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets. Come see how the leading lights in the Python data community are making Python ever more useful to data analysts and data engineers.


9:00am – 10:30am: IPython – Fernando Pérez (University of California at Berkeley) and Brian Granger (Cal Poly San Luis Obispo)
10:10am – 10:30am: Collaborative Data Science with coLaboratory, Kayur Patel (Google) and Kester Tong (Google)

10:30am – 11:00am: BREAK

11:00am – 12:30am: Python for Distributed Analytics and Visualization, Andy Terrel (Continuum Analytics) and Peter Wang (Continuum Analytics, Inc.)

12:30pm – 1:30pm: LUNCH

1:30pm – 3:00pm: Room 1 E 12 – Intro to NumPy and matplotlib, Jake Vanderplas (eScience Institute, University of Washington)
1:30pm – 3:00pm: Room 1 E 13 – Intro scikit-learn + pandas for Predictive Modeling, Olivier Grisel (Inria & scikit-learn)

3:00pm – 3:30pm: BREAK

3:30pm – 5:00pm Room 1 E 12 – SciPy – An Exploration of the Most Useful Bits, Travis Oliphant (Continuum Analytics, Inc.)
3:30pm – 4:15pm Room 1 E 13 – New and Upcoming Features in Pandas, Wes McKinney (Cloudera)
4:20pm – 5:00pm Room 1 E 13 – High Performance Python, Trent Nelson (Continuum Analytics)

Photo of Fernando Perez

Fernando Perez

Lawrence Berkeley National Laboratory and UC Berkeley

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, where his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. Fernando created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. He regularly lectures about scientific computing and data science and is a member of the Python Software Foundation, a founding member of the NumFOCUS Foundation, and a National Academy of Science Kavli Frontiers of Science Fellow. He is the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation. Fernando holds a PhD in particle physics from the University of Colorado at Boulder, followed by postdoctoral research in applied mathematics, developing numerical algorithms.

Photo of Brian Granger

Brian Granger

Cal Poly San Luis Obispo

Brian is an Associate Professor of Physics and Data Science at Cal Poly State University in San Luis Obispo, CA, where he teaches Data Science. He is a leader of the IPython project, co-founder of Project Jupyter and is an active contributor to a number of other open source projects focused on data science in Python. Recently, he co-created the Altair package for statistical visualization in Python. He is a advisory board member of the NumFOCUS Foundation and a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship.

Photo of Andy Terrel

Andy Terrel

Bold Metrics

Data architect, computational scientist, and technical leader. Andy is the CTO of Bold Metrics, where he is bringing his experience building smart scalable data systems to the fashion industry. You will also find him leading the board of the NumFOCUS foundation. As a passionate advocate for open source scientific codes Andy has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.

Photo of Peter Wang

Peter Wang

Continuum Analytics

Peter Wang is the cofounder and CTO of Continuum Analytics, where he leads the product engineering team for the Anaconda platform and open source projects including Bokeh and Blaze. Peter has been developing commercial scientific computing and visualization software for over 15 years and has software design and development experience across a broad variety of areas, including 3D graphics, geophysics, financial risk modeling, large data simulation and visualization, and medical imaging. As a creator of the PyData conference, he also devotes time and energy to growing the Python data community by advocating, teaching, and speaking about Python at conferences worldwide. Peter has a BA in physics from Cornell University.

Photo of Jake Vanderplas

Jake Vanderplas

eScience Institute, University of Washington

Jake Vanderplas is the director of research in the physical sciences at the University of Washington’s eScience Institute, where his research is primarily in the area of data-driven astronomy and astrophysics. In addition, Jake is a maintainer and/or frequent contributor to many open source Python projects, including scikit-learn, scipy, mpld3, and others. He occasionally blogs about Python, machine learning, data visualization, open science, and related topics at

Photo of Olivier Grisel

Olivier Grisel

Inria & scikit-learn

Olivier Grisel is a software engineer at Inria Saclay, France, where he works on scikit-learn, an open source project for machine learning in Python. Olivier also contributes occasional bug fixes to upstream projects in the NumPy/SciPy ecosystem.

Photo of Travis Oliphant

Travis Oliphant

Continuum Analytics, Inc.

Travis Oliphant has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the definitive Guide to NumPy.

Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of the Biomedical Imaging Lab, where he researched satellite remote sensing, MRI, ultrasound, elastography, and scanning impedance imaging.

From 2007-2011, Travis was the president at Enthought, Inc. During his tenure there, the company grew from 15 to 50 employees, and Travis worked with well-known Fortune 50 companies in finance, oil-and-gas, and consumer-products. He was involved in all aspects of the contractual relationship, including consulting, training, code-architecture, and development.

As CEO of Continuum Analytics, Travis engages customers in finance, consumer products, and oil and gas, develops business strategy, and helps guide technical direction of the company. He actively contributes to software development and engages with the wider open source community in the Python ecosystem by serving as a director of the Python Software Foundation and past director of Numfocus.

Photo of William McKinney

William McKinney


Data systems @ Cloudera. Formerly founder/CEO of DataPad ( Author of “Python for Data Analysis” from O’Reilly Media. Created pandas project.

Trent Nelson

Continuum Analytics

Photo of Kayur Patel

Kayur Patel


Kayur Patel makes data science tools easier to use and studies how people apply machine learning to solve problems and build software. Kayur received his PhD in Computer Science and Engineering from the University of Washington. His graduate work was funded by grants from the NSF and Google as well as the NDSEG and Microsoft Research fellowships. He is currently working at Google and recently taught the Introduction to Data Science course at Columbia.

Photo of Kester Tong

Kester Tong


I am a software engineer at Google Research. I work on machine learning algorithms and infrastructure, and on a product for collaborative data analysis, coLaboratory.

Comments on this page are now closed.


Picture of Andy Terrel
Andy Terrel
10/20/2014 12:08pm EDT

@Muni, Sorry this took so long:

Picture of Muni Xu
Muni Xu
10/15/2014 10:24am EDT


Thanks a lot Andy! BTW, very enjoyable talk you gave in the morning!

Picture of Andy Terrel
Andy Terrel
10/15/2014 9:57am EDT


I’ll try to get them posted here later tonight.

— Andy

Picture of Muni Xu
Muni Xu
10/15/2014 8:37am EDT

Hi Andy,

Where can I get access to all the slides you guys showed this morning?


Picture of Andy Terrel
Andy Terrel
10/15/2014 5:46am EDT

To get the blaze and bokeh codes working:

conda install blaze bokeh

Picture of Olivier Grisel
Olivier Grisel
10/15/2014 5:41am EDT

People who plan to attend the scikit-learn tutorial this afternoon, please check that you have a recent Anaconda installation up and running on your laptop:

Check that you have scikit-learn and pandas there:

conda install scikit-learn pandas

You can fetch the notebooks for this session from:

If you don’t have git installed on your laptop, you can use the “Download ZIP” button on that page.

Note: we will only cover a subset of this tutorial and there is no need to download any data beyond what is already included in the repository.

Picture of Sophia DeMartini
Sophia DeMartini
10/15/2014 2:43am EDT

Hi David,

I’ve gone in and edited your registration so that you’re now registered for the PyData all-day tutorial.

Thank you,

David Sedgwick
10/15/2014 12:45am EDT

Hello, I would like to register for this tutorial. I’m currently registered for AM and PM tutorials but this all day session appears more interesting. How do I change registration?

Picture of Sophia DeMartini
Sophia DeMartini
10/08/2014 1:51pm EDT

Hi Jason,

I just double checked your registration, and you’re already signed up to attend PyData Day. If I can help with anything else, please email


Jason Gilbertson
10/08/2014 1:22pm EDT

Would love to attend (already have the 3-day pass) but seem to be missing how to register for this event. All I see in the top right is ‘Registration option’ with a star but nothing happens.

Picture of Fernando Perez
Fernando Perez
10/07/2014 11:06pm EDT

@Luciano, for the IPython tutorial, you can find teaching materials on github:

You can find the installation instructions for IPython itself, in case you don’t have it, here:

Luciano Tozato dos Reis
10/07/2014 12:30pm EDT

Are there materials or downloads that we need to prepare in advance for the PyData tutorial?

Picture of Sophia DeMartini
Sophia DeMartini
10/01/2014 4:50pm EDT

Hi Dylan,

You should have some basic knowledge of Python. The tutorial presentations will be at an introductory level and will be coding oriented. This will be a great opportunity to explore the use of Python in data analytics whether you are at a novice or intermediate level.


Dylan Patterson
10/01/2014 7:13am EDT

I’m considering joining this seminar for the conference. How much proficiency in python is expected in order to be able to benefit from this course? Is this a coding oriented class, or is it a high level discussion?