Please bring a laptop and power cord—each class will be centered around hands-on exercises.
Before class, please install both R and the RStudio IDE and ensure that your computer can connect to the Internet. You will also need to download several R packages. We will email you the final list of packages to download the week before the class.
You can find instructions on how to install R, the RStudio IDE, and the R packages at R for Data Science.
9:00 AM – 10:30 AM
R quickstart: Transform and visualize data
Garrett Grolemund explores the most used—and most powerful—parts of the R language. You will learn the best ways to perform the core tasks of data science, including:
These fast and intuitive packages will provide a solid foundation for everything else you do in R.
10:30 AM – 11:00 AM
11:00 AM – 12:30 PM
Validating models in R
Nina Zumel and John Mount
Nina Zumel and John Mount demonstrate a number of techniques, R packages, and R code for validating predictive models, using example code, data, and live demonstrations and exercises. Learn how to determine if there is usable signal in your data, select variables, and choose models using R and R graphics (ggplot2). Increase your statistical efficiency and squeeze more signal out of your data.
12:30 PM – 1:30 PM
1:30 PM – 3:00 PM
Scaling R: Analytics for big data
Stephen Elston teaches techniques for deep exploration and modeling of large, complex datasets with R, including:
3:00 PM – 3:30 PM
3:30 PM – 5:00 PM
Reproducible reports with big data
Garrett Grolemund demonstrates a time-saving workflow that has become the new standard for reproducible research. The R Markdown package makes it easy to document both your code and your results in the same file. With an R Markdown file and the click of a button, you can re-execute your analysis with the most up-to-date code and data to create new results, and/or generate a polished report in a variety of formats (HTML, PDF, DOC, etc.) to share your results. Garrett offers some best practices that further increase the efficiency of reproducible research with R Markdown.
Garrett Grolemund is a data scientist and chief instructor for RStudio, Inc. Garrett is a longtime user and advocate of R; he wrote the popular lubridate package for working with dates and times in R. Garrett designed and delivered the highly rated O’Reilly video series Introduction to Data Science with R and is the author of Hands-On Programming with R as well as the coauthor, with Hadley Wickham, of R for Data Science. He holds a PhD in statistics and specializes in teaching others how to do data science with open source tools.
Nina Zumel is cofounder and principal at Win-Vector LLC, a data science consultancy based in San Francisco. She frequently writes and speaks on statistics and machine learning. She is also the coauthor of the popular book Practical Data Science with R (Manning 2014).
John Mount is a principal consultant at Win-Vector LLC, a San Francisco data science consultancy. John has worked as a computational scientist in biotechnology and a stock-trading algorithm designer and has managed a research team for Shopping.com (now an eBay company). John is the coauthor of Practical Data Science with R (Manning Publications, 2014). John started his advanced education in mathematics at UC Berkeley and holds a PhD in computer science from Carnegie Mellon (specializing in the design and analysis of randomized algorithms). He currently blogs about technical issues at the Win-Vector blog, tweets at @WinVectorLLC, and is active in the Rotary. Please contact firstname.lastname@example.org for projects and collaborations.
Stephen Elston is an experienced big data geek, data scientist, and software business leader. Steve is principal consultant at Quantia Analytics, LLC, where he leads the building of new business lines, manages P&L, and takes software products from concept and financing through development, intellectual property protection, sales, customer shipment, and support. Steve is also an instructor for the University of Washington data science program. Steve has over two decades of experience in visualization, predictive analytics and machine learning, at scales from small to massive, using many platforms including Hadoop, Spark, R, S/SPLUS, and Python. He has created solutions in fraud detection, capital markets, wireless systems, law enforcement, and streaming analytics for the IoT.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.