Skip to main content

Expressing Yourself in R

Hadley Wickham (Rice University / RStudio)
GA Ballroom K
Average rating: ****.
(4.92, 12 ratings)
Slides:   external link

There are three main time sinks in any data science task:

1. Figuring out what you want to do.
2. Turning a vague goal into a precise set of tasks (i.e. programming).
3. Actually crunching the numbers.

A well-design domain specific language (or DSL) tightly coupled to the problem domain can make all three pieces faster. In this talk, I’ll discuss two DSLs built in R: ggvis for visualisation and dplyr for data manipulation. These build on my previous packages ggplot2 and plyr, improving both expressivity and speed.

Data visualisation and manipulation are key parts of the data science process.
ggvis makes it easy to declaratively describe interactive web graphics. It combines a declarative syntax based on [ggplot2]( with [shiny](‘s reactive programming model and [vega](’s declarative JS rendering system. dplyr implements the most important verbs of data manipulation in a datastore-agnostic fashion, so you can think about and compute with your data in the same way regarldess of whether you’re working with a local in-memory data frame or a remote on-disk database.

Photo of Hadley Wickham

Hadley Wickham

Assistant Professor / Chief Scientist, Rice University / RStudio

Hadley Wickham is Chief Scientist at RStudio. He is an active member of the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualisation. His research focusses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualisation to better understand data and models.