Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks

tao huang (JD.com), mang zhang (JD.com), Bing Bai (JD.com)

2:00pm–2:40pm Thursday, 09/13/2018

Data engineering and architecture
Location: 1E 09 Level: Beginner

Secondary topics: Data Platforms, Retail and e-commerce, Transportation and Logistics

Average rating:

(3.00, 1 rating)

Who is this presentation for?

BI engineers and distributed software developers

Prerequisite knowledge

Familiarity with Alluxio Presto and HDFS

What you'll learn

Learn how to use Alluxio as a pluggable optimization component
Understand how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing while ensuring consistency between Alluxio and HDFS

Description

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 nodes and a total capacity of 210 PB.

Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. Tao Huang, Mang Zhang, and 白冰 explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.

tao huang

JD.com

Tao Huang is a big data platform development engineer at JD.com, where he is mainly engaged in the development and maintenance of the company’s big data platform, using open source projects such as Hadoop, Spark, Alluxio and Kubernetes. He focuses on migrating Hadoop to the Kubernetes cluster, which will run long-running services and batch jobs, to improve the cluster resource utilization.

mang zhang

JD.com

Mang Zhang is a big data platform development engineer at JD.com, where he is mainly engaged in the construction and development of the company’s big data platform, using open source projects such as Hadoop, Spark, Hive, Alluxio and Presto. He focuses on the big data ecosystem and is an open source developer, the contributor of Alluxio, Hadoop, Hive and Presto.

Bing Bai

JD.com

白冰 is a senior big data platform development engineer at JD.com focusing on computation and storage framworks such as Spark, Hive, Presto, Alluxio, and HDFS. 白冰 is experienced in designing and developing architecture for deploying the frameworks into production with large-scale clusters.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com