Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

50 reasons to learn the shell for doing data science

Jeroen Janssens (Data Science Workshops)
11:1511:55 Thursday, 24 May 2018
Data science and machine learning
Location: Capital Suite 12 Level: Beginner
Average rating: ***..
(3.00, 2 ratings)

Who is this presentation for?

  • Data scientists, analysts, developers, engineers, and system administrators

What you'll learn

  • Explore the Unix command line and learn how to use it to work with data efficiently and augment your existing data science workflow


“Anyone who does not have the command line at their beck and call is really missing something,” tweeted Tim O’Reilly when Jeroen Janssens’s Data Science at the Command Line was recently made available online for free. As Tim’s tweet suggests, the command line (and its ecosystem of power tools) is not just standing the test of time; it’s more popular than ever. Join Jeroen to learn what you’re missing out on if you’re not applying the command line and many of its power tools to typical data science problems.

The Unix command line isn’t just available on web servers, wireless routers, and supercomputers. It can also be found on macOS, the Raspberry Pi, and, most recently, Windows 10. Although invented decades ago, it turns out to be an amazing environment for efficiently performing tedious but essential data science tasks—and in some situations, it even outperforms new technologies. By combining small, powerful command-line tools like grep, sort, awk, parallel, jq, and csvsql, you can quickly obtain, scrub, explore, and even model your data.

If you’ve ever wondered what the command line is or what it can do for you, this session is for you. Jeroen walks you through applying the command line to some typical data science problems and covers the core concepts of the command line. You’ll learn how to break a data science problem into smaller problems, choose the appropriate command-line tools, and chain them together and how to integrate the command line with your existing data science workflow, whether it consists of the Jupyter Notebook, R, or Excel. You’ll leave ready to get started with the command line and will probably want to learn more about this exciting piece of technology. And why not? It’s been around for almost 50 years. It’s not like it’s going anywhere soon.

Photo of Jeroen Janssens

Jeroen Janssens

Data Science Workshops

Jeroen Janssens is the founder, CEO, and an instructor of Data Science Workshops, which provides on-the-job training and coaching in data visualization, machine learning, and programming. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and startups YPlan and Outbrain in New York City. He’s the author of Data Science at the Command Line (O’Reilly). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.