Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Recent developments in SparkR for advanced analytics

Xiangrui Meng (Databricks)
1:15pm–1:55pm Thursday, 09/29/2016
Data science & advanced analytics
Location: 3D 10 Level: Intermediate
Tags: r-lang
Average rating: ****.
(4.00, 2 ratings)

Prerequisite knowledge

  • A general understanding of data analysis in R and Spark
  • What you'll learn

  • Understand the basics of advanced analytics in SparkR: what advanced analytic features SparkR supports, how the features were implemented, and how to integrate SparkR with existing R packages
  • Description

    Since its introduction in Spark 1.4, SparkR has received contributions from both the Spark community and the R community. Xiangrui Meng explores recent community efforts to extend SparkR for scalable advanced analytics—including summary statistics, single-pass approximate algorithms, and machine-learning algorithms ported from Spark MLlib—and shows how to integrate existing R packages with SparkR to accelerate existing R workflows.

    Photo of Xiangrui Meng

    Xiangrui Meng


    Xiangrui Meng is an Apache Spark PMC member and a software engineer at Databricks. His main interests center around developing and implementing scalable algorithms for scientific applications. Xiangrui has been actively involved in the development and maintenance of Spark MLlib since he joined Databricks. Previously, he worked as an applied research engineer at LinkedIn, where he was the main developer of an offline machine learning framework in Hadoop MapReduce. He holds a PhD from Stanford, where he worked on randomized algorithms for large-scale linear regression problems.

    Comments on this page are now closed.


    Picture of André Morrow
    André Morrow
    10/04/2016 12:36pm EDT

    All Strata + Hadoop World 2016 slide presentations have now been posted if they were made available to us.

    Arun Venkateswaran
    10/04/2016 12:27pm EDT


    Could you post the slides of this session please?