San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Optimizing computing cluster resource utilization with an in-memory distributed filesystem

Yue Li (MemVerge), Shouwei Chen (Rutgers University)

11:00am–11:40am Thursday, March 28, 2019

Data Engineering & Architecture
Location: 2008

Secondary topics: Data Platforms, Retail and e-commerce, Storage

Average rating:

(5.00, 4 ratings)

Who is this presentation for?

Senior engineers

Level

Intermediate

Prerequisite knowledge

Familiarity with Spark internals and Spark tuning
A basic understanding of storage

What you'll learn

Learn how JD.com optimizes workloads in Spark and builds strong storage clusters for huge data
Understand how to develop and tune high-performance clusters

Description

Spark is the most common computing framework used for JD.com’s big data platform, which highly depends on memory resources. This can be a heavy burden to the clusters as a whole. Users usually need to configure every workload manually by increasing the memory or CPU cores of each Spark executor.

JD.com recently designed a brand-new architecture to optimize Spark computing clusters by separating the computing stage and shuffle (spill) stage into different clusters. The architecture implements the shuffle manager writing data out into the in-memory storage cluster in order to reduce the memory burden for computing cluster and uses a fast storage device to increase the memory space of the storage cluster. Yue Li and Shouwei Chen detail the problems the team faced when building it and explain how the company benefits from the in-memory distributed filesystem now. Join in to learn how the system increases memory capacity while decreasing the memory cost of each executor.

Yue Li

MemVerge

Yue Li is a cofounder at MemVerge, where together with his colleagues, he’s developing the company’s core technologies. Previously, he was a senior postdoctoral fellow at the California Institute of Technology. He has extensive research experience on both theoretical and experimental aspects of algorithms for nonvolatile memories. Yue holds a PhD in computer science from Texas A&M University and a bachelor’s degree in computer science from Huazhong University of Science & Technology.

Shouwei Chen

Rutgers University

Shouwei Chen is an ECE PhD student at Rutgers University, advised by Ivan Rodero. Shouwei’s research focuses on the codesign of a memory-centric computing framework with an in-memory distributed filesystem.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com