Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Machine learning in R

Jared Lander (Lander Analytics)
1:30pm5:00pm Tuesday, September 26, 2017
Secondary topics:  R
Average rating: ***..
(3.25, 4 ratings)

Who is this presentation for?

  • Data scientists and machine learning practitioners

Prerequisite knowledge

  • A basic understanding of R and linear models

Materials or downloads needed in advance

  • A laptop with R, RStudio (optional), and the following R packages installed: glmnet, coefplot, xgboost, boot, and ggplot2

What you'll learn

  • Understand regularization, boosted trees, and cross-validation


Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today’s incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation.


Elastic net

  • Penalized regression with the lasso and ridge methods
  • Fitting models with glmnet
  • The coefficient path
  • Coefficients with coefplot

Boosted trees

  • Making classifications (and regression) using recursive partitioning
  • Fitting models with xgboost
  • Making compelling visualizations with xgb.plot.multi.trees


  • The reasoning for and process behind cross-validation
  • Cross-validating glm models with cv.glm
Photo of Jared Lander

Jared Lander

Lander Analytics

Jared P. Lander is chief data scientist of Lander Analytics, where he oversees the long-term direction of the company and researches the best strategy, models, and algorithms for modern data needs. He specializes in data management, multilevel models, machine learning, generalized linear models, data management, visualization, and statistical computing. In addition to his client-facing consulting and training, Jared is an adjunct professor of statistics at Columbia University and the organizer of the New York Open Statistical Programming Meetup and the New York R Conference. He is the author of R for Everyone, a book about R programming geared toward data scientists and nonstatisticians alike. Very active in the data community, Jared is a frequent speaker at conferences, universities, and meetups around the world and was a member of the 2014 Strata New York selection committee. His writings on statistics can be found at He was recently featured in the Wall Street Journal for his work with the Minnesota Vikings during the 2015 NFL Draft. Jared holds a master’s degree in statistics from Columbia University and a bachelor’s degree in mathematics from Muhlenberg College.

Comments on this page are now closed.


CHitra Bhagat |
09/26/2017 8:34am EDT

Thank you Sophia

Picture of Sophia DeMartini
09/26/2017 7:02am EDT

@Chitra, here is the information the speakers sent out ahead of the tutorial:

Please install the latest versions of R and RStudio BEFORE you arrive onsite. If you have not updated R in a while, now is a great time. Please also install the latest versions of the glmnet, coefplot, xgboost, boot and ggplot2 packages.

You will be working with real data, so please visit ahead of time and download the following data: acsNew.csv, acs_ny.csv, housing1.csv, housingNew.csv, wine.csv, manhattan_Test.csv, manhattan_Train.csv and manhattan_Validate.csv.

CHitra Bhagat |
09/26/2017 6:57am EDT

Can you pls share the links with material that is required to be downloaded for this tutorial?