Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Usages and Optimizations of Spark at Tencent

LianHui Wang (Tencent)
4:50pm–5:30pm Thursday, 02/19/2015
Spark in Action
Location: 210 C/G
Average rating: ***..
(3.00, 3 ratings)

Tencent runs one of the world’s largest social networks. With over 800 million active users, our data systems ingest 700 TB of data per day. Spark is an important part of our data systems with applications in ads optimizations, business intelligence, personalization, and recommendation systems.

In this talk, we introduce the general data architecture of Tencent with a focus on our Spark use cases on a GAIA (our improved resource manager based on YARN) cluster of 8000+ nodes. We contrast Spark with the previous MapReduce use cases, followed by tuning methods and optimizations for large scale clusters.

Photo of LianHui Wang

LianHui Wang


LianHui Wang is a Software Engineer from Tencent’s TEG Big Data Department. He is also a contributor to Spark, Hadoop, and Hive. He has been instrumental in Tencent’s Hadoop and Spark applications, with focus in graph computation, machine learning and SparkSQL.