Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

The polyglot data scientist

Jeroen Janssens (Data Science Workshops)
17:25–18:05 Thursday, 2/06/2016
Data science & advanced analytics
Location: Capital Suite 8/9 Level: Intermediate
Average rating: ****.
(4.00, 5 ratings)

Prerequisite knowledge

Attendees should have some experience with one or more of the following languages: R, Python, and JavaScript


It’s generally good advice to stick to one programming language or one computing environment. The code will most likely be more consistent, more stable, and easier to maintain. However, sometimes, especially for exploratory data science projects, it can be more effective or efficient to mix and match. For instance, consider the situation where you want to make use of a fast machine-learning library. It turns out that this library is written in C++, but you work in R, and there are no language bindings available yet. Or consider the situation where you know how to solve a particular subproblem in R, but your collaborator is using another language.

Jeroen Janssens discusses three approaches to become a polyglot data scientist. Jeroen first explores Beaker Notebook, which allows you to use multiple languages (Python, R, JavaScript, Julia, etc.) in one notebook. He then looks at several language-specific ways of combining programming languages (e.g., how to load R data into MATLAB, how to use a MATLAB package in Python, and how to call Python functions from R). This list of combinations is not exhaustive, but it will give you a good idea of the possibilities. Finally, Jeroen explains how to write your own reusable command-line tools and employ command-line tools directly from Python and R. The command line is language agnostic, which means that you can combine tools written in just about any language. With a few simple steps, it’s possible to turn your existing code into a command-line tool.

Photo of Jeroen Janssens

Jeroen Janssens

Data Science Workshops

Jeroen Janssens is the founder, CEO, and an instructor of Data Science Workshops, which provides on-the-job training and coaching in data visualization, machine learning, and programming. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and startups YPlan and Outbrain in New York City. He’s the author of Data Science at the Command Line (O‚ÄôReilly). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.