Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Optimizing computing cluster resource utilization with an in-memory distributed filesystem

Yue Li (MemVerge), Shouwei Chen (Rutgers University)
11:00am11:40am Thursday, March 28, 2019
Average rating: *****
(5.00, 4 ratings)

Who is this presentation for?

  • Senior engineers

Level

Intermediate

Prerequisite knowledge

  • Familiarity with Spark internals and Spark tuning
  • A basic understanding of storage

What you'll learn

  • Learn how JD.com optimizes workloads in Spark and builds strong storage clusters for huge data
  • Understand how to develop and tune high-performance clusters

Description

Spark is the most common computing framework used for JD.com’s big data platform, which highly depends on memory resources. This can be a heavy burden to the clusters as a whole. Users usually need to configure every workload manually by increasing the memory or CPU cores of each Spark executor.

JD.com recently designed a brand-new architecture to optimize Spark computing clusters by separating the computing stage and shuffle (spill) stage into different clusters. The architecture implements the shuffle manager writing data out into the in-memory storage cluster in order to reduce the memory burden for computing cluster and uses a fast storage device to increase the memory space of the storage cluster.  Yue Li and Shouwei Chen detail the problems the team faced when building it and explain how the company benefits from the in-memory distributed filesystem now. Join in to learn how the system increases memory capacity while decreasing the memory cost of each executor.

Photo of Yue Li

Yue Li

MemVerge

Yue Li is a cofounder at MemVerge, where together with his colleagues, he’s developing the company’s core technologies. Previously, he was a senior postdoctoral fellow at the California Institute of Technology. He has extensive research experience on both theoretical and experimental aspects of algorithms for nonvolatile memories. Yue holds a PhD in computer science from Texas A&M University and a bachelor’s degree in computer science from Huazhong University of Science & Technology.

Photo of Shouwei Chen

Shouwei Chen

Rutgers University

Shouwei Chen is an ECE PhD student at Rutgers University, advised by Ivan Rodero. Shouwei’s research focuses on the codesign of a memory-centric computing framework with an in-memory distributed filesystem.