Python for Data Analysis

Data Science Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ***..
(3.88, 8 ratings)

*Attendees wishing to follow along on their own computers should have a working installation of the IPython HTML Notebook and pandas 0.10.1, such as can be obtained by installing either of the free EPDFree (http://www.enthought.com) or Anaconda CE (http://continuum.io/anacondace.html) Python distributions.

Please make sure to download all installations BEFORE you arrive onsite.*

This tutorial will be a hands-on introduction to the
essential tools for working with structured data in
Python, pandas and NumPy. We’ll look at some of the basic mechanics of the libraries
then work through some real world examples to illustrate how to load,
clean, and wrangle data into the form needed to produce some summaries
and visualizations.

Photo of Wes McKinney

Wes McKinney

Two Sigma Investments

Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library and a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera and was the founder and CEO of DataPad.

Comments on this page are now closed.

Comments

Picture of Wes McKinney
Wes McKinney
06/18/2013 3:51pm PDT

I gave a very similar tutorial recently and posted the materials here: http://wesmckinney.com/blog/?p=687. sorry for the very long delay

Minakshi Mukherjee
03/11/2013 4:21am PDT

Could you please share the tutorial here? I am looking for the practice session which you discussed using stackoverflow data set? Thanks Mina

Picture of Wes McKinney
Wes McKinney
02/26/2013 5:15am PST

If you have EPDFree and find that pandas is not installed (try “import pandas” in a python session) you should run “enpkg install pandas” (possibly with sudo depending on your machine)

Picture of Wes McKinney
Wes McKinney
02/26/2013 3:32am PST

It’s not picking up the EPD Python (it would say “Enthought Python Distribution” in the welcome message). You will need the full IPython notebook stack to be able to follow along with the code examples on your own computer. If you can’t get it working it’s no big deal, I’m expecting fewer than half the attendees to be actively following along.

Yang Yang
02/26/2013 3:24am PST

@Wes:

I did set the path to the downloaded EPD python, but it fails to import pandas, because the unzipped EPD dir does not contain anything with name pandas>

yyang@yyang-ThinkPad-T410:/tools/epd_free-7.3-2-rh5-x86$ export PATH=`pwd`/bin/:$PATH:`pwd`/bin yyang@yyang-ThinkPad-T410:/tools/epd_free-7.3-2-rh5-x86$ which python /home/yyang/tools/epd_free-7.3-2-rh5-x86/bin//python yyang@yyang-ThinkPad-T410:~/tools/epd_free-7.3-2-rh5-x86$ python Python 2.7.3 (default, Sep 26 2012, 21:57:08) [GCC 4.7.2] on linux2 Type “help”, “copyright”, “credits” or “license” for more information. >>> import pandas Traceback (most recent call last): File “”, line 1, in ImportError: No module named pandas

my system python after I did “pip install pandas numpy” did work fine with “import pandas”. if that’s all is needed, I’ll just use my own python

thanks Yang

Picture of Wes McKinney
Wes McKinney
02/26/2013 1:57am PST

@Yang Make sure that the EPDFree bin directory is in your path and ensure that “import pandas” works inside a python session.

Yang Yang
02/26/2013 12:54am PST

I installed epd distro, but it does not contain pandas:

yyang@yyang-ThinkPad-T410:/tools/epd_free-7.3-2-rh5-x86$ pwd /home/yyang/tools/epd_free-7.3-2-rh5-x86 yyang@yyang-ThinkPad-T410:/tools/epd_free-7.3-2-rh5-x86$ find . -name “pandas” yyang@yyang-ThinkPad-T410:~/tools/epd_free-7.3-2-rh5-x86$

I did install pandas and numpy through my native ubuntu dpkg system, that works. but apart from pandas, what else do we need?

Picture of Wes McKinney
Wes McKinney
02/21/2013 11:41am PST

pandas 0.10.0 from EPDFree should be fine

Picture of Sophia DeMartini
Sophia DeMartini
02/21/2013 3:38am PST

From speaker Wes McKinney:

Attendees should have a working knowledge of Python syntax, data types, and data structures like lists, dicts. Given the size of the tutorial I will not be guiding users though writing code but rather focusing on explaining code examples through a serious of examples. If you wish to follow along and run the code on your own computer please have a working install of pandas 0.10.1 and the IPython notebook, such as you would obtain from installing the free EPDFree or Anaconda CE distributions.

Picture of Phillip Burger
Phillip Burger
02/19/2013 2:33pm PST

I use the Enthought distribution of Python. They are at pandas 10.0 (according to their site I just checked). The email I received this evening from O’Reilly recommends that we have pandas 10.1. I doubt 10.1 is absolutely required but just want to let you know that the Enthought distribution is currently is at the pandas 10.0 level.

Kaushal Patel
01/17/2013 6:15am PST

What level of Python experience is required for this class ? Any other requirements

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts