Presented by O'Reilly and Cloudera
Make Data Work
July 12-13, 2017: Training
July 13-15, 2017: Tutorials & Conference
Beijing, China

基于 Spark 的数据管理、探索、计算平台 (A Spark-based data management, exploration, and computing platform)

此演讲使用中文 (This will be presented in Chinese)

XueMin Zhang (TalkingData)
14:00–14:40 Saturday, 2017-07-15
Spark及更多发展 (Spark & beyond)
Location: 多功能厅8A+8B(Function Room 8A+8B) 观众水平 (Level): Beginner

必要预备知识 (Prerequisite Knowledge)

了解Spark、Alluxio、Jenkins等开源技术

您将学到什么 (What you'll learn)

了解基于Spark构建计算平台的历程,明白架构选择时的取舍

描述 (Description)

TalkingData于13年底开始引入Spark,目前数据中心所有数据处理都以迁至Spark计算平台。
随着业务的快速发展,数据源及数据量的大幅提升,数据资产管理和数据分析、挖掘工作日趋增多,慢慢的沉淀出了基于Spark, Alluxio、Jenkins等开源技术的数据管理、探索及计算平台。
演讲者主要介绍平台的背景及其技术架构演进,以及在使用过程中踩过的一些坑和后续规划。


TalkingData implemented Spark at the end of 2013, and in the time since, all data processing in its data center has been moved to the Spark platform. With the rapid development of TalkingData’s business, its data sources and data volume are increasing significantly. As work for data asset management, data analysis, and data mining also increases, there is need for a data management, exploration, and computing platform based on Spark, Alluxio, Jenkins, and other open source technologies.

XueMin Zhang offers an overview of TalkingData’s platform, sharing the background and evolution of its technical architecture, along with some pitfalls experienced over the course of its use and some follow-up plans.

Photo of XueMin Zhang

XueMin Zhang

TalkingData

6年多软件开发和管理经验,曾在新浪平台架构部担任大数据team leader,负责微博核心数据存储以及大数据计算解决方案,以及在久其、锐安科技担任开发工程师,积累了丰富的软件开发与项目经验,目前就职于TalkingData DTU。专
注于大数据领域,对Hadoop、Spark、HBase的维护与开发有深入研究。

Connect with O'ReillyData

Use the QR Code to follow OReillyData and get the latest conference information and browse data articles.

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

Read the latest ideas on big data.

ORB Data Site