Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Why data scientists should love Linux containers

William Benton (Red Hat)
1:15pm–1:55pm Wednesday, 09/12/2018
Secondary topics:  Model lifecycle management
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Data scientists and AI developers

Prerequisite knowledge

  • Familiarity with Python or data science workflows (useful but not required)

What you'll learn

  • Learn how containers and automated build pipelines can realize the potential of interactive notebooks as truly reproducible research, how data scientists can use containers and workflows from the DevOps world to communicate with application development teams, how container platforms let data scientists scale experiments beyond their laptops with easy access to powerful and specialized hardware and simplify governing access to sensitive internal data and provide a clearer path to regulatory compliance, and how to get started using key open source projects that enable data scientists and machine learning engineers to make the most of container technology


Linux containers make it easy for teams to deploy, manage, and scale distributed applications and for operators to exploit compute capacity in the cloud. Although it might not be obvious, a great foundation for production applications can also support the exploratory work of data scientists and machine learning engineers.

William Benton details the advantages of containers for data scientists and AI developers, focusing on high-level tools that will enable you to become more productive and collaborate more effectively. To provide context, William briefly explains what containers are and why developers love them. He then covers several key benefits of containers for data scientists, focusing on repeatability, collaboration, scalability, and compliance. You’ll learn how containers fulfill the promise of reproducible research, ease moving techniques from prototype to production, enable painless publishing and collaboration workflows, and empower you to safely develop techniques against sensitive data in a production environment from the comfort of your laptop.

There are myriad tutorial resources explaining how to build and run container images, but these largely assume an audience whose primary responsibilities include packaging, releasing, and managing applications. William focuses on why data scientists should care about containers and the high-level tools built on top of containers that will enhance their daily work. Data scientists will leave with a better understanding of the advantages of containers and concrete suggestions for how to use higher-level tools to make their work more productive. Application and AI developers will learn about the commonalities between engineering workflows and data science workflows and leave with a better understanding of how containers can support their data scientist colleagues and enable cross-functional collaboration.

Photo of William Benton

William Benton

Red Hat

William Benton leads a team of data scientists and engineers at Red Hat, where he has also applied machine learning to problems ranging from forecasting cloud infrastructure costs to designing better cycling workouts. His current focus is investigating the best ways to build and deploy intelligent applications in cloud-native environments, but he has also conducted research and development in the areas of static program analysis, managed language runtimes, logic databases, cluster configuration management, and music technology.