It’s generally good advice to stick to one programming language or one computing environment. The code will most likely be more consistent, more stable, and easier to maintain. However, sometimes, especially for exploratory data science projects, it can be more effective or efficient to mix and match. For instance, consider the situation where you want to make use of a fast machine-learning library. It turns out that this library is written in C++, but you work in R, and there are no language bindings available yet. Or consider the situation where you know how to solve a particular subproblem in R, but your collaborator is using another language.
Jeroen Janssens is the founder, CEO, and an instructor of Data Science Workshops, which provides on-the-job training and coaching in data visualization, machine learning, and programming. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and startups YPlan and Outbrain in New York City. He’s the author of Data Science at the Command Line (O’Reilly). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.