Implementing Map/Reduce applications using tools like Java can be hard; as a result, it is often useful to be able to use Map/Reduce from other languages. In this tutorial, we’ll provide an introduction to RHadoop, an open source Map/Reduce library for R. We will assume that attendees have a broad familiarity with R and Hadoop, however the exercises do not require attendees to be an expert in either platform.
First, we will discuss the basics of Map/Reduce, a framework for writing massively parallel big data analytics, and the nuances of the RHadoop implementation.
Next, we’ll discuss some common techniques in RHadoop including maintaining application state, processing data that has a Zipfian distribution, representing distributed matrices, performing basic operations over distributed matrices, finding outliers, and debugging.
Finally, we’ll walk through an interactive exercise to show attendees how to create a trending topic analysis using LDA and RHadoop. First, we’ll show attendees how to install both Hadoop and the rmr package, which provides Map/Reduce functionality. Then we’ll walk through an interactive coding example that demonstrates how to actually use RHadoop to create a sliding window analysis of trending topics.
Edmund Kohlwey is a developer and data scientist at Booz Allen Hamilton. For the last three years, he has helped government clients adopt and develop their big data capabilities across many different problem domains.
Stephanie Beben is an analytics engineer and developer at Booz Allen Hamilton with two years experience designing and implementing solutions to big data problems using cloud technologies for U.S. government clients.
Prior to joining Booz Allen Hamilton, Stephanie received a M.S. in Mathematics from Texas A&M University.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.