Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)
12:0512:45 Wednesday, 23 May 2018
Secondary topics:  Data Platforms, E-commerce and Retail, Transportation and Logistics

Who is this presentation for?

  • Business intelligence engineers and distributed software developers

Prerequisite knowledge

  • Familiarity with Alluxio and HDFS

What you'll learn

  • Learn how JD.com uses Alluxio as a pluggable optimization component to provide support for ad hoc and real-time stream computing

Description

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.

Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

Photo of Baolong Mao

Baolong Mao

JD.com

Baolong Mao is a big data platform development engineer at JD.com, where he works on the company’s big data platform and focuses on big data ecosphere. He is an open source developer, Alluxio PMC and contributor, and Hadoop contributor. He’s a fan of technology sharing and open source.

Photo of Yiran Wu

Yiran Wu

JD.com

Yiran Wu is a big data platform development engineer at JD.com, where he is mainly engaged in the construction and development of the company’s big data platform, using open source projects such as Hadoop, Spark, Hive, and Alluxio. He focuses on the big data ecosystem and is an open source developer, Alluxio contributor, and Hadoop contributor.

Photo of Yupeng Fu

Yupeng Fu

Alluxio

Yupeng Fu is a founding member and senior architect at Alluxio and a PMC member of the Alluxio open source project. Previously, Yupeng worked at Google building big data analytics platforms and Palantir, where he led the efforts building the company’s storage solution. Yupeng holds a BS and an MS from Tsinghua University and has completed coursework toward a PhD at UCSD.