Skip to main content

Scalable Analytics with R, Hadoop and RHadoop

Gwen Shapira (Confluent)
Databases & Datastores
Portland 256
Average rating: **...
(2.50, 2 ratings)

Modern data applications often require analyzing multi-terabyte data sets. R is one of the most popular languages for data processing. It is best known for its large library of advanced statistical tools. However, using R to analyze multi-terabyte data sets present challenges – How do we avoid transmitting all the data over the network? How do we scale statistical algorithms? What are the options of integrating R with Hadoop clusters?

This presentation is geared towards R beginners with some knowledge of Hadoop and Map-Reduce concepts. Attendees will learn important R concepts, effective data wrangling tools and how to scale R algorithms for large data sets using RHadoop. We will discuss RHadoop in depth and share deployment, scalability and troubleshooting lessons that we have learned the hard way.

Photo of Gwen Shapira

Gwen Shapira


Gwen Shapira is a Solutions Architect at Cloudera and leader of IOUG Big Data SIG. Gwen Shapira studied computer science, statistics and operations research at the University of Tel Aviv, and then went on to spend the next 15 years in different technical positions in the IT industry. She specializes in scalable and resilient solutions and helps her customers build high-performance large-scale data architectures using Hadoop. Gwen Shapira is a frequent presenter at conferences and regularly publishes articles in technical magazines and her blog.