Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)
2:00pm–2:40pm Thursday, 09/13/2018
Data engineering and architecture
Location: 1E 09 Level: Beginner
Secondary topics:  Data Platforms, Retail and e-commerce, Transportation and Logistics
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • BI engineers and distributed software developers

Prerequisite knowledge

  • Familiarity with Alluxio Presto and HDFS

What you'll learn

  • Learn how to use Alluxio as a pluggable optimization component
  • Understand how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing while ensuring consistency between Alluxio and HDFS

Description

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 nodes and a total capacity of 210 PB.

Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. Tao Huang, Mang Zhang, and 白冰 explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

Photo of tao huang

tao huang

JD.com

Tao Huang is a big data platform development engineer at JD.com, where he is mainly engaged in the development and maintenance of the company’s big data platform, using open source projects such as Hadoop, Spark, Alluxio and Kubernetes. He focuses on migrating Hadoop to the Kubernetes cluster, which will run long-running services and batch jobs, to improve the cluster resource utilization.

Photo of mang zhang

mang zhang

JD.com

Mang Zhang is a big data platform development engineer at JD.com, where he is mainly engaged in the construction and development of the company’s big data platform, using open source projects such as Hadoop, Spark, Hive, Alluxio and Presto. He focuses on the big data ecosystem and is an open source developer, the contributor of Alluxio, Hadoop, Hive and Presto.

Photo of Bing Bai

Bing Bai

JD.com

白冰 is a senior big data platform development engineer at JD.com focusing on computation and storage framworks such as Spark, Hive, Presto, Alluxio, and HDFS. 白冰 is experienced in designing and developing architecture for deploying the frameworks into production with large-scale clusters.