O'Reilly、Cloudera 主办
Make Data Work

Hadoop遇到云上对象存储——实现原理、陷阱和性能优化 (When Hadoop meets object storage: Implementation principles, pitfalls, and performance optimization)

此演讲使用中文 (This will be presented in Chinese)

余根茂 (阿里云), Haifeng Chen (Intel)
16:20–17:00 Friday, 2017-07-14
Hadoop内核&发展 (Hadoop internals & development)
地点: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): 中级 (Intermediate)

必要预备知识 (Prerequisite Knowledge)


您将学到什么 (What you'll learn)


描述 (Description)

1. Hadoop和对象存储:介绍AWS S3、Azure Storage和阿里云OSS等常见的对象存储,以及在Hadoop平台中常见的使用方式
2. 对象存储和传统文件系统的实现区别,对象存储没有目录的概念,需要上层的文件系统实现中额外增加这个概念。另外,包括对象的上传、下载的方式,delete和rename等操作的实现细节也和HDFS不同
3. Hadoop 3.0新特性阿里云OSS文件系统实现的介绍
4. 阿里云OSS文件系统和HDFS性能对比,TestDFSIO、TeraSort和TPC-DS等,性能差异点分析
5. 阿里云OSS文件系统上的性能优化,涉及input format、output committer、read cache、multiple upload等等
6. 总结

费辉 and Haifeng Chen explore the differences between implementing object storage systems and HDFS, explain how to avoid the pitfalls when using them, and share optimization methods for some specific usage scenarios.

Topics include:

  • Hadoop and object storage: A few well-known object storage systems (e.g., AWS S3, Azure Storage, and the Alibaba Cloud’s OSS) and common ways of using object storage in a Hadoop platform
  • Differences between implementing object storage systems and traditional file-based systems (For example, object storage has no such a concept like a folder, therefore requiring the upper-level filesystem to implement this additional concept. In addition, detailed methods to implement object operations like upload, download, delete, and rename in an object storage system are different from those in HDFS.)
  • New features in Hadoop 3.0 and how the Alibaba Cloud’s OSS file system was implemented
  • A performance comparison between the Alibaba Cloud’s OSS and HDFS (e.g., TestDFSIO, TeraSort, and TPC-DS) and analysis of the key differences in performance
  • Performance optimizations implemented in the Alibaba Cloud’s OSS filesystem (input format, output committer, read cache, multiple upload, etc.)
Photo of 余根茂



2014年3月加入淘宝技术部,专注于集团内的Spark集群和服务建设。2015年5月加入阿里云,致力于在公有云上提供开源计算服务,关注分布式计算方向,Apache Hadoop和Spark社区贡献者。

Photo of Haifeng Chen

Haifeng Chen


Haifeng Chen is a senior software architect at Intel’s Asia Pacific R&D Center. He has more than 12 years’ experience in software design and development, big data, and security, with a particular interest in image processing. Haifeng is the author of image browsing, editing, and processing software ColorStorm.



WeChat QRcode


Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2


ORB Data Site