Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Dask: Flexible analytic computing for Python

Matthew Rocklin (Anaconda)
11:1511:55 Wednesday, 24 May 2017
Level: Intermediate
Average rating: ****.
(4.33, 3 ratings)

Who is this presentation for?

  • Data scientists and data engineers

Prerequisite knowledge

  • An understanding of the Python ecosystem

What you'll learn

  • Learn how Python can scale to distributed systems using existing libraries in conjunction with a new library, dask

Description

The data science Python ecosystem (NumPy, pandas, and scikit-learn) is efficient and intuitive for advanced analytics workloads. Unfortunately, these tools are restricted to data that fits into memory and runs on a single core. Dask is a parallel computing library that complements the Python ecosystem by providing a distributed parallel framework for high-performance task scheduling.

Dask now parallelizes Python libraries like NumPy, pandas, parts of scikit-learn, and other more custom algorithms. This effort was done in collaboration with those core development communities and has led to a seamless big data experience for Python users for data analysis and complex analytics.

Matthew Rocklin discusses the basic architecture of dask, classes of applications in which it is commonly useful, and how it fits into the broader Hadoop ecosystem.

Photo of Matthew Rocklin

Matthew Rocklin

Anaconda

Matthew Rocklin is an open source software developer at Anaconda focusing on efficient computation and parallel computing, primarily within the Python ecosystem. He has contributed to many of the PyData libraries and today works on Dask, a framework for parallel computing. Matthew holds a PhD in computer science from the University of Chicago, where he focused on numerical linear algebra, task scheduling, and computer algebra.

Comments on this page are now closed.

Comments

Picture of Matthew Rocklin
Matthew Rocklin | COMPUTATIONAL SCIENTIST
26/05/2017 9:37 BST

Slides available here: http://matthewrocklin.com/slides/strata-london-2017.html#/

Notebooks here: https://github.com/dask/dask-ec2/tree/master/notebooks

Michal Kucharczyk | BI & RISK MANAGEMENT SPECIALIST
26/05/2017 9:16 BST

Hello Matthew, do you plan to share the slides?