Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Pydata sessions

Add to your personal schedule
9:00am5:00pm Tuesday, September 26, 2017
Location: 1A 06/07
Ben Lorica (O'Reilly Media), Assaf Araki (Intel), Jacob Schreiber (University of Washington), Alex Ratner (Stanford University), Madeleine Udell (Cornell University), Yunsong Guo (Pinterest), Katherine Heller (Duke University), Alan Nichol (Rasa), Gerard de Melo (Rutgers University), Tamara Broderick (MIT), Inbal Tadeski (Anodot), Daniel Kang (Stanford University), Bichen Wu (UC Berkeley), Shaked Shammah (Hebrew University)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 23/24 Level: Intermediate
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
Natural language processing is a key component in many data science systems that must understand or reason about text. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, TensorFlow for training custom machine-learned annotators, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 08/10 Level: Intermediate
Matthew Rocklin (Anaconda)
Average rating: ****.
(4.67, 3 ratings)
Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. Matthew Rocklin discusses the architecture and current applications of Dask used in the wild and explores computational task scheduling and parallel computing within Python generally. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 08/10 Level: Intermediate
Shoumik Palkar (Stanford University), Matei Zaharia (Stanford University)
Average rating: *****
(5.00, 2 ratings)
Modern data applications combine functions from many optimized libraries (e.g., pandas and TensorFlow) and yet do not achieve peak hardware performance due to data movement across functions. Shoumik Palkar and Matei Zaharia offer an overview of Weld, a new interface to implement functions in these libraries while enabling optimizations across them. Read more.