Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

R Day

Garrett Grolemund (RStudio), Yihui Xie (RStudio, Inc.), Nathan Stephens (RStudio, Inc.), Randall Prium (Calvin College)
9:00am–5:00pm Tuesday, 09/29/2015
Data Science & Advanced Analytics
Location: 1 E16 / 1 E17
Average rating: ****.
(4.20, 15 ratings)
Slides:   external link

TUTORIAL PREREQUISITES

  • Please bring a laptop and power cord to class – each class will be centered around hands-on exercises.
  • Before class, please install both R and the RStudio IDE, and ensure that your computer can connect to the internet. We will use the following R packages in class, so you should install them ahead of time:
  • tidyr, devtools, dplyr, ggplot2, scales, rmarkdown, knitr, DT, shiny, DBI, and RSQLite as well as the reportsWS package which must be installed with the command:
    devtools::install_github(“rstudio/reportsWS”)

Description

From advanced visualization, collaboration, and reproducibility to data manipulation, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers, the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data.

Schedule

R Quickstart: Wrangle, transform, and visualize data
Instructor: Garrett Grolemund
90 minutes

This 90 minute quickstart will teach you the most used–and most powerful–parts of the R language. You will learn the best ways to perform the core tasks of data science:

  • How to wrangle your data (with the tidyr package)
  • How to transform your data (with the dplyr package)
  • How to visualize your data (with the ggplot2 package)
  • These fast and intuitive packages will provide a solid foundation for everything else you do in R.

Work with Big Data in R
Instructor: Nathan Stephens
90 minutes

R is the go to language for data exploration and development, but what role can R play in production with big data? In this class, RStudio’s solution engineer demonstrates a pragmatic approach for pairing R with big data. You will learn to use R’s familiar dplyr syntax to query big data stored on a server based data store, like Amazon Redshift or Google BigQuery. We will also discuss how to generalize the process to other big data stores, and how to best leverage R within a big data pipeline.

Reproducible Reports with Big Data
Instructor: Yihui Xie
90 minutes

This tutorial will teach you a time-saving workflow that has become the new standard for reproducible research. The R Markdown package makes it easy to document both your code and your results in the same file. With an R Markdown file and the click of a button, you can
re-execute your analysis with the most up-to-date code and data to create new results, and/or
generate a polished report in a variety of formats (html, pdf, Word, etc.) to share your results
This class will demonstrate some best practices that further increase the efficiency of reproducible research with R Markdown.

Interactive Shiny Applications built on Big Data
Instructor: Garrett Grolemund
90 minutes

R’s Shiny package lets you move beyond static reports to easily build interactive applications powered by R. Run your Shiny apps locally, or share them over a server with clients, customers, and colleagues. Your visitors will be in the driver seat. They can explore data, monitor dashboards, run R analyses, or do anything else you prepare for them all without knowing any R code. If a picture is worth 1000 words, a Shiny app is worth a million. In this tutorial, you will learn the basics of creating Shiny apps, as well as the best practices for using big data with your apps.

Photo of Garrett Grolemund

Garrett Grolemund

RStudio

Garrett Grolemund is the editor-in-chief of shiny.rstudio.com, the development center for the Shiny R package, and is the author of Hands-On Programming with R as well as Data Science with R, a forthcoming book by O’Reilly Media. Garrett works as a data scientist and chief instructor for RStudio, Inc.

Photo of Yihui Xie

Yihui Xie

RStudio, Inc.

Yihui Xie is an active R user and the author of several R packages, such as animation, formatR, Rd2roxygen, and knitr, among which the animation package won the 2009 John M. Chambers Statistical Software Award (ASA). He is also the author of the book Dynamic Documents with R and knitr. In 2006 he founded the “Capital of Statistics” (http://cos.name), which has grown into a large online community on statistics in China. He initiated the first Chinese R conference in 2008 and has been organizing R conferences in China since then. During his PhD training at the Iowa State University, he won the Vince Sposito Statistical Computing Award (2011) and the Snedecor Award (2012) in the Department of Statistics. His research interests include interactive statistical graphics, statistical computing, and web applications.

Photo of Nathan Stephens

Nathan Stephens

RStudio, Inc.

Nathan Stephens recently joined RStudio as director of solutions engineering. His background is in applied analytics and consulting. He has experience building data science teams, creating innovative data products, analyzing big data, and architecting analytic platforms. He was an early adopter of R and has introduced it into many organizations. Nathan holds an MS in statistics from Brigham Young University.

Photo of Randall Prium

Randall Prium

Calvin College

Randall Pruim is a professor of mathematics and statistics at Calvin College, author of Foundations and Applications of Statistics: An Introduction Using R, and the maintainer of several R packages, including fastR and mosaic. His research interests include statistical computing and statistics education (especially for students in the natural sciences).

Comments on this page are now closed.

Comments

Picture of Garrett Grolemund
Garrett Grolemund
10/02/2015 8:14am EDT

David, You can download the course materials from the lin kat the top of this page.

David Gardner
09/29/2015 6:47am EDT

will the presentation materials be made available for this session?

Elizabeth Barayuga
09/28/2015 1:53pm EDT

I am trying to install ggplot2 but I am getting a message error of package ggplot2 is not available (for R version 3.1.2)
Is there a different version of R that I should be working with?

Yana Kane-Esrig
09/25/2015 10:18am EDT

Hi Garrett,
Thank you.

The “load” commands worked fine (once I noticed and fixed the fact that copying and pasting the commands from this website into RStudio on PC put the wrong kind of double quotes around the url).

It took me a few trials to get to the point where I noticed the problem with the double quotes, and in the process of poking around I had tried utils::setInternet2()
I do not know whether this step was necessary or not for enabling the load commands to work, so I am mentioning it in case other people with corporate laptops are having trouble.

See you at the tutorial.

Yana

Picture of Garrett Grolemund
Garrett Grolemund
09/25/2015 9:56am EDT

Hi everyone,

Some students have reported curl related trouble installing the reportsWS package from github, and others may have had trouble but not reported it.

These error messages usually suggest that you are downloading from behind a firewall or proxy server that is not configured to allow downloads from github (a common scenario on corporate networks). Your IT team may be able to help you fix the problem, if you wish.

WHY YOU SHOULD NOT WORRY

We will have a back up server with the package installed that you can use at the conference if necessary.

We will only use the reportsWS package for the data sets that it contains. If you would like to have those data sets on your local computer, you can download them and load them into your R session one at a time with the code below. Each line will create a data frame with the name of the .rdata file. Note that this method will only work on a computer/network that allows downloads from github.com. Also note, as mentioned above, that this will not not be necessary to participate in the class.

load(url(“https://github.com/rstudio/reportsWS/blob/master/data/artists.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/births.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/bnames.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/bnames2.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/cases.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/pollution.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/songs.rdata?raw=true”))
load(url(“https://github.com/rstudio/reportsWS/blob/master/data/storms.rdata?raw=true”))

WHY YOU SHOULD WORRY

github.com is a widely used resource in the R community. Many R packages are developed and disseminated through github.com. If you would like to become a flexible R user, it will be important to acquire the ability to install packages from github.com onto the computer that you use.

Yana Kane-Esrig
09/18/2015 3:20pm EDT

Sorry, I should have mentioned: I am on a Windows 7 corporate laptop. I did download the latest version of R and of RStudio.

Yana Kane-Esrig
09/18/2015 3:16pm EDT

Hi,
I tried to download the required packages.
I was not able to install reportsWS. (Y=The other recommended packages installed without a problem).

Here is what I tried.

> library(“devtools”)
> devtools::install_github(“rstudio/reportsWS”)
Downloading GitHub repo rstudio/reportsWS@master
Error in curl::curl_fetch_memory(url, handle = handle) :
Peer certificate cannot be authenticated with given CA certificates
> library(“reportsWS”)
Error in library(“reportsWS”) : there is no package called ‘reportsWS’
> library(“rstudio/reportsWS”)
Error in library(“rstudio/reportsWS”) :
there is no package called ‘rstudio/reportsWS’
> devtools::install_github(“rstudio/reportsWS”)
Downloading GitHub repo rstudio/reportsWS@master
Error in curl::curl_fetch_memory(url, handle = handle) :
Peer certificate cannot be authenticated with given CA certificates

Picture of Garrett Grolemund
Garrett Grolemund
09/02/2015 11:36am EDT

Rajesh, It looks like you are right about the registration options. Or perhaps the one day tickets have sold out. As speakers, we aren’t involved with the registration decisions.

Rajesh Sahasrabuddhe
09/02/2015 11:11am EDT

Just a couple quick notes:
For some reason my browser kept loading a cached version on this site so I never saw the agenda when posted.

Last year, there was an option for one-day registration for just R-day. That does not seem to be available this year – or did I miss it?

Picture of Garrett Grolemund
Garrett Grolemund
08/27/2015 8:55am EDT

Just to be clear. The final agenda is now displayed on this webpage. I look forward to seeing everyone at R Day!

Picture of Garrett Grolemund
Garrett Grolemund
07/21/2015 11:43am EDT

Shawn, The agenda will be finalized soon. We’re evaluating some cutting edge technologies to highlight.

Shawn Gerou
07/21/2015 11:39am EDT

Apologies if I missed it but is there an agenda yet for this?

Thanks

Picture of Shanfan Huang
Shanfan Huang
05/26/2015 6:26am EDT

Can you please indicate a pre-requisite knowledge/skill level of the target audience?