Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Machine learning in R

Jared Lander (Lander Analytics)
9:00am12:30pm Tuesday, December 5, 2017
Data science and advanced analytics, Machine Learning
Location: 321/322 Level: Intermediate

Who is this presentation for?

  • Data scientists and machine learning practitioners

Prerequisite knowledge

  • A basic understanding of R, linear models, and classification

Materials or downloads needed in advance

  • A laptop with R, RStudio, and the following R packages installed: glmnet, coefplot, xgboost, boot, and ggplot2

What you'll learn

  • Understand regularization, boosted trees, and cross-validation

Description

Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today’s incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation.

Outline

Elastic net

  • Penalized regression with the lasso and ridge methods
  • Fitting models with glmnet
  • The coefficient path
  • Coefficients with coefplot

Boosted trees

  • Making classifications (and regression) using recursive partitioning
  • Fitting models with xgboost
  • Making compelling visualizations with xgb.plot.multi.trees

Cross-validation

  • The reasoning for and process behind cross-validation
  • Cross-validating glm models with cv.glm
Photo of Jared Lander

Jared Lander

Lander Analytics

Jared P. Lander is chief data scientist of Lander Analytics, where he oversees the long-term direction of the company and researches the best strategy, models, and algorithms for modern data needs. He specializes in data management, multilevel models, machine learning, generalized linear models, data management, visualization, and statistical computing. In addition to his client-facing consulting and training, Jared is an adjunct professor of statistics at Columbia University and the organizer of the New York Open Statistical Programming Meetup and the New York R Conference. He is the author of R for Everyone, a book about R programming geared toward data scientists and nonstatisticians alike. Very active in the data community, Jared is a frequent speaker at conferences, universities, and meetups around the world and was a member of the 2014 Strata New York selection committee. His writings on statistics can be found at Jaredlander.com. He was recently featured in the Wall Street Journal for his work with the Minnesota Vikings during the 2015 NFL Draft. Jared holds a master’s degree in statistics from Columbia University and a bachelor’s degree in mathematics from Muhlenberg College.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)