If you are registered for this tutorial, please download and install Anaconda BEFORE you arrive onsite, for the scikit-learn section.
For "Intro to data visualization with Bokeh," attendees should have Python and Bokeh installed on their system. The simplest way to obtain both is to install the Anaconda Python distribution, which comes with Bokeh and all of its dependencies (Full installation instructions here).
Python has become an increasingly important part of the data-engineer and analytic-tool landscapes. PyData at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including IPython Notebook, NumPy/matplotlib, SciPy, and scikit-learn, and explores how to scale Python performance, including handling large, distributed datasets. Come see how the leading lights in the Python data community are making Python ever more useful to data analysts and data engineers.
9:00 AM – 10:30 AM
Data wrangling and intro to pandas
T.J. Alumbaugh and James Powell
T.J. Alumbaugh and James Powell offer a brief tour of the data ingest and data exploration capabilities found in the Python language. We’ll explore a few datasets using pandas, the Jupyter notebook, and the matplotlib plotting package and learn some basic methods of how to clean up real data found in the wild. Then, we’ll do a few ad hoc analyses to explore the datasets. This is a use case where the PyData stack really shines.
10:30 AM – 11:00 AM
11:00 AM – 11:30 AM
Data wrangling and intro to pandas (continued)
11:30 AM – 12:30 PM
Intro to data visualization with Bokeh
Bryan Van de Ven and Sarah Bird
Bokeh allows you to build interactive visualizations for the Web in Python. It has a range of capabilities from quick “one-line” charts to streaming datasets to integrating with your existing plot libraries such as matplotlib or ggplot. Bryan Van de Ven and Sarah Bird give a quick hands-on introduction to Bokeh’s core features. We’ll do exercises building up a variety of visualizations and finish up discussing topics and questions from participants related to their own datasets and needs.
12:30 PM – 1:30 PM
1:30 PM – 2:30 PM
Intro to data visualization with Bokeh (continued)
2:30 PM – 3:00 PM
Intro to machine learning with scikit-learn
Jake Vanderplas and Katrina Riehl
Jake Vanderplas and Katrina Riehl offer an introduction to the core concepts of machine learning and the scikit-learn package. After introducing the scikit-learn API, we’ll use it to explore the basic categories of machine-learning problems and related topics such as feature selection and model validation and practice applying these tools to real-world datasets.
3:00 PM – 3:30 PM
3:30 PM – 5:00 PM
Intro to machine learning with scikit-learn (continued)
T.J. Alumbaugh is a developer at Continuum Analytics. He likes array-oriented computing, Python, and C++.
James Powell is a NYC-based Python programmer with experience in quantitative finance and data science. James is also very active in the Python community, where he organizes NYC Python, the world’s largest and most active Python meetup group. He also works with the numeric and scientific computing nonprofit NumFOCUS to help organize the PyData conference series. James is a frequent speaker at Python conferences and has been invited to speak at events such as PyData New York, PyData London, PyGotham, the conference For Python Quants, and PyCon Spain.
Bryan Van de Ven is a software engineer at Continuum Analytics. Previously, Bryan worked at the Applied Research Labs, developing software for sonar feature detection and classification systems on US Naval submarine platforms, and Enthought, where he worked on problems in financial risk modeling and fluid mixing simulation. Bryan has also worked on an assortment of iOS projects as an independent consultant. Bryan is a core contributor of Bokeh and contributed to the Chaco visualization library. Bryan received undergraduate degrees in computer science and mathematics from UT Austin and a master’s degree in physics from UCLA.
Jake Vanderplas is the director of research in the physical sciences at the University of Washington’s eScience Institute, where his research is primarily in the area of data-driven astronomy and astrophysics. In addition, Jake is a maintainer and/or frequent contributor to many open source Python projects, including scikit-learn, scipy, mpld3, and others. He occasionally blogs about Python, machine learning, data visualization, open science, and related topics at Jakevdp.github.io.
Katrina Riehl is a senior data scientist at Continuum Analytics, where she leads the Memex team. Over the last decade, Katrina has worked extensively in the fields of scientific computing, machine learning, data mining, and visualization. Most notably, she worked at Enthought, the signal and information sciences laboratory at the Applied Research Laboratories of the University of Texas at Austin, and Apple before joining Continuum Analytics. Katrina received her MS and PhD in computer science from the University of Texas at Dallas.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are trademarks of the Apache Software Foundation and are used with permission. The ASF has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.