San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Big data analytics in the public cloud: Challenges and opportunities

Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)

11:15–11:55 Thursday, 2 May 2019

Data Engineering and Architecture
Location: S11 B

Secondary topics: AI and Data technologies in the cloud

Average rating:

(4.50, 2 ratings)

Download slides (PPTX)

Who is this presentation for?

Big data software architects and those working at public cloud service providers

Level

Intermediate

Prerequisite knowledge

Familiarity with big data, the cloud, and benchmarking

What you'll learn

Understand performance gaps in the public cloud
Explore an in-memory data accelerator built with new hardware like persistent memory and RDMA NICs that improves the performance of big data analytics workloads in the cloud and enables new use cases

Description

Cloud-based big data analytics is growing faster than traditional on-premises solutions, as it provides excellent scalability, simplifies management, and reduces costs. Public cloud adoption has become the top priority for big data investments. However, performance and feature gaps still exist that must be resolved.

Jian Zhang, Chendi Xue, and Yuan Zhou explore the performance and feature challenges caused by migrating big data analytics workloads to the cloud, including disaggregated object storage commonly used by public CSPs, cloud connectors for big data and the cloud, and compute service orchestration (e.g., running Spark on Kubernetes). They then share the evolution of big data analytics in the public cloud, reveal the root cause of performance gaps of typical workloads (TeraSort, DFSIO, TPC-DS, and k-means) in different scenarios. They conclude with a discussion of a new in-memory data accelerator: high-performance layer leveraging state-of-the-art technologies like persistent memory and RDMA to accelerate ephemeral data access. You’ll see promising performance numbers on prototypes that illustrate how this approach enables hybrid transactional analytical processing (HTAP) workloads in the cloud. Along the way, you’ll learn how to leverage new hardware technologies like persistent memory and RDMA for big data analytics in the cloud.

Jian Zhang

Intel

Jian Zhang is a senior software engineer manager at Intel, where he and his team primarily focus on open source storage development and optimizations on Intel platforms and build reference solutions for customers. He has 10 years of experience doing performance analysis and optimization for open source projects like Xen, KVM, Swift, and Ceph and working with Hadoop distributed file system (HDFS) and benchmarking workloads like SPEC and TPC. Jian holds a master’s degree in computer science and engineering from Shanghai Jiao Tong University.

Chendi Xue

Intel

Chendi Xue is a software engineer on the data analytics team at Intel. She has more than five years’ experience in big data and cloud system optimization, focusing on storage, network software stack performance analysis, and optimization. She participated in the development works including Spark-Shuffle optimization, Spark-SQL ColumnarBased execution, compute side cache implementation, storage benchmark tool implementation, etc. Previously, she worked on Linux device mapper optimization and iSCSI optimization during her master degree study.

Website

Yuan Zhou

Intel

Yuan Zhou is a senior software development engineer in the Software and Service Group at Intel, where he works on the Open Source Technology Center team primarily focused on big data storage software. He’s been working in databases, virtualization, and cloud computing for most of his 7+ year career at Intel.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com