Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Jupyter notebooks and production data science workflows

Andrew Therriault (City of Boston)
11:05am–11:45am Friday, August 25, 2017
Usage and application
Location: Sutton Center/Sutton South Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data scientists, software developers, data science engineers, systems architects, and data science managers

Prerequisite knowledge

  • Basic familiarity with Jupyter notebooks
  • An understanding of the essential concepts of production deployments (version control, automation, reusable code, etc.)

What you'll learn

  • Explore options for building data science workflows that offer the benefits of working in Jupyter while still allowing for essential production features such as automation, version control, code reuse, collaboration, and monitoring

Description

Until recently, Jupyter notebooks were primarily a tool for individual data scientists working on their own machines. Software engineers used them mostly for doing exploratory or one-off analyses or at most, early-stage development of things that would eventually need to move elsewhere. When it came time to move to production, a project would need to be exported to ordinary Python scripts and worked on like any other code.

That mindset is finally changing in many large organizations, as Jupyter has become a first-rate member of enterprise-scale data science stacks. But there’s no one right way to use Jupyter in production. With the ability to run notebooks in the background, data scientists have the option of keeping all of their code in Jupyter while still maintaining the reliability and automation capability of standard Python scripts.

But just because you can stay in Jupyter, should you? Andrew Therriault walks you through several different production workflows for combining Jupyter with standard Python scripts, modules, and packages. Using real-world examples from his own experience, Andrew covers the pros and cons of each approach, giving you the knowledge you need to apply to your own projects.

Photo of Andrew Therriault

Andrew Therriault

City of Boston

Andrew Therriault is the chief data officer for the City of Boston, where he leads Boston’s Analytics team, a nationally recognized leader in using data science to improve city operations and make progress in critical areas such as public safety, education, transportation, and health. Previously, Andrew was director of data science for the Democratic National Committee and served as editor of Data and Democracy: How Political Data Science Is Shaping the 2016 Elections from O’Reilly. He holds a PhD in political science from NYU and completed a postdoctoral research fellowship at Vanderbilt.