Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

50 reasons to learn the shell for doing data science

Jeroen Janssens (Data Science Workshops B.V.)
2:55pm–3:35pm Wednesday, 09/12/2018
Data science and machine learning
Location: 1A 12/14 Level: Beginner
Average rating: *....
(1.50, 2 ratings)

Who is this presentation for?

  • Data scientists, analysts, developers, engineers, and system administrators

What you'll learn

  • Learn how the command line can augment your current workflow and where to learn more about this exciting topic


“Anyone who does not have the command line at their beck and call is really missing something,” tweeted Tim O’Reilly when Jeroen Janssens’s Data Science at the Command Line was recently made available online for free. As Tim’s tweet suggests, the command line (and its ecosystem of power tools) is not just standing the test of time; it’s more popular than ever. Join Jeroen to learn what you’re missing out on if you’re not applying the command line and many of its power tools to typical data science problems.

The Unix command line isn’t just available on web servers, wireless routers, and supercomputers. It can also be found on macOS, the Raspberry Pi, and, most recently, Windows 10. Although invented decades ago, it turns out to be an amazing environment for efficiently performing tedious but essential data science tasks—and in some situations, it even outperforms new technologies. By combining small, powerful command-line tools like grep, sort, awk, parallel, jq, and csvsql, you can quickly obtain, scrub, explore, and even model your data.

If you’ve ever wondered what the command line is or what it can do for you, then this session is for you. As he walks you through applying the command line to some typical data science problems, Jeroen covers the core concepts of the command line, how to break a data science problem into smaller problems, choose the appropriate command-line tools, and chain them together, and how to integrate the command line with your existing data science workflow, whether it consists of the Jupyter Notebook, R, or Excel. You’ll leave ready to get started with the command line and will probably want to learn more about this exciting piece of technology. And why not? It’s been around for almost 50 years. It’s not like it’s going anywhere soon.

Photo of Jeroen Janssens

Jeroen Janssens

Data Science Workshops B.V.

Jeroen Janssens is the founder and CEO of Data Science Workshops B.V., which provides on-the-job training and coaching in data visualization, machine learning, and programming. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and startups YPlan and Outbrain in New York City. He is the author of Data Science at the Command Line, published by O’Reilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.