Presented by O'Reilly and Cloudera
Make Data Work
July 12-13, 2017: Training
July 13-15, 2017: Tutorials & Conference
Beijing, China

使用R和Apache Spark处理大规模数据 (Scaling R faster and larger using Apache Spark)

此演讲使用中文 (This will be presented in Chinese)

Xiaoyong Zhu (Microsoft)
11:15–11:55 Friday, 2017-07-14
数据科学&高级分析 (Data science & advanced analytics)
Location: 多功能厅5B+C(Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)
平均得分:: *****
(5.00, 1 次得分)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of R, Spark, and machine learning

您将学到什么 (What you'll learn)

Learn how to use R to analyze terabytes of data

描述 (Description)

R是一个流行的用于数据分析的数据科学工具。然而它有不少的缺陷,比如它的内存使用问题以及单线程的设计。本演讲:

- 我们会介绍微软R服务器的设计原则和架构,以及它和Apache Spark的集成。

- 演示如何使用R服务器来进行在Apache Spark上的可扩展的机器学习,以及使用R语言来分析T字节级数据。


R is a popular data science tool for data analysis. However, it has many drawbacks, such as its memory utilization and single-thread design, that limit its usage for big data analysis. Xiaoyong Zhu explains how to use R to analyze terabytes of data, covering the design principles and the architecture of Microsoft R Server and its integration with Apache Spark and leading a demo on how to utilize it to perform scalable machine learning on top of Apache Spark.

Photo of Xiaoyong Zhu

Xiaoyong Zhu

Microsoft

Xiaoyong Zhu is a program manager at Microsoft focusing on scalable machine learning and advanced analytics.

Connect with O'ReillyData

Use the QR Code to follow OReillyData and get the latest conference information and browse data articles.

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

Read the latest ideas on big data.

ORB Data Site