Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Machine learning in R

Jared Lander (Lander Analytics)
9:00am12:30pm Tuesday, December 5, 2017
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • Data scientists and machine learning practitioners

Prerequisite knowledge

  • A basic understanding of R, linear models, and classification

Materials or downloads needed in advance

In addition to the latest version of R, and optionally RStudio, attendees should attain an prepared RStudio project by either git cloning or downloading and unzipping the folder from

Attendees should also have the latest versions of the following packages installed:

  • tidyverse
  • glmnet
  • xgboost
  • DiagrammeR
  • coefplot
  • dygraphs

What you'll learn

  • Understand regularization, boosted trees, and cross-validation


Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today’s incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation.


Elastic net

  • Penalized regression with the lasso and ridge methods
  • Fitting models with glmnet
  • The coefficient path
  • Coefficients with coefplot

Boosted trees

  • Making classifications (and regression) using recursive partitioning
  • Fitting models with xgboost
  • Making compelling visualizations with xgb.plot.multi.trees


  • The reasoning for and process behind cross-validation
  • Cross-validating glm models with cv.glm
Photo of Jared Lander

Jared Lander

Lander Analytics

Jared P. Lander is chief data scientist of Lander Analytics, where he oversees the long-term direction of the company and researches the best strategy, models, and algorithms for modern data needs. He specializes in data management, multilevel models, machine learning, generalized linear models, data management, visualization, and statistical computing. In addition to his client-facing consulting and training, Jared is an adjunct professor of statistics at Columbia University and the organizer of the New York Open Statistical Programming Meetup and the New York R Conference. He is the author of R for Everyone, a book about R programming geared toward data scientists and nonstatisticians alike. Very active in the data community, Jared is a frequent speaker at conferences, universities, and meetups around the world and was a member of the 2014 Strata New York selection committee. His writings on statistics can be found at He was recently featured in the Wall Street Journal for his work with the Minnesota Vikings during the 2015 NFL Draft. Jared holds a master’s degree in statistics from Columbia University and a bachelor’s degree in mathematics from Muhlenberg College.