Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Data science with Unix power tools

Jeroen Janssens (Data Science Workshops B.V.)
1:30pm–5:00pm Tuesday, 09/11/2018
Data science and machine learning
Location: 1A 10 Level: Intermediate
Average rating: ***..
(3.00, 3 ratings)

Who is this presentation for?

  • Data scientists, analysts, developers, engineers, and system administrators

Prerequisite knowledge

Materials or downloads needed in advance

What you'll learn

  • Understand how to break a data science problem into smaller problems, choose the appropriate command-line tools, and chain them together
  • Explore the command line and its rich ecosystem of ever-evolving tools
  • Know when it’s appropriate to use the command line instead of a programming language
  • Learn how to obtain, scrub, explore, and model data using only the command line, create your own command-line tools, and integrate the command line with your existing workflow, whether it consists of the Jupyter Notebook, R, or Excel


Although it was invented decades ago, the Unix command line is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

Join Jeroen Janssens for a hands-on workshop based on his book Data Science at the Command Line. Using a real-world use case, you’ll learn how to build fast data pipelines, leverage R and Python at the command line, and quickly visualize and model data.

You’ll leave with a solid understanding of how to integrate the command line in your data science workflow. Even if you’re already comfortable processing data with R or Python, the ability to leverage the power of the command line will make you a more effective and efficient data scientist.

Photo of Jeroen Janssens

Jeroen Janssens

Data Science Workshops B.V.

Jeroen Janssens is the founder and CEO of Data Science Workshops B.V., which provides on-the-job training and coaching in data visualization, machine learning, and programming. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and startups YPlan and Outbrain in New York City. He is the author of Data Science at the Command Line, published by O’Reilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.