Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Building containerized Spark on a solid foundation with Quobyte and Kubernetes

Daniel Bäurer (inovex GmbH), Sascha Askani (inovex GmbH)
16:3517:15 Wednesday, 24 May 2017
Big data and the Cloud
Location: Capital Suite 13
Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Cloud platform engineers and big data systems engineers

Prerequisite knowledge

  • Basic knowledge of Kubernetes, Spark, and HDFS concepts

What you'll learn

  • Explore a containerized alternative to the traditional storage options normally used for distributed storage in a Spark context


There are many challenges when deploying distributed applications on containers. One of the biggest is the lack of a stable and performant distributed filesystem. HDFS works very well with legacy Hadoop installations on commodity hardware in classic IT environments since it is very cheap to store a large amount of data on your compute nodes (data locality), but cloud-native environments do not allow HDFS to play out its advantages. Data locality on compute nodes, for example, stands contrary to the idea behind containers or cloud infrastructures. For this reason, many cloud-first implementations fall back to object stores like Amazon S3, Google Cloud Storage, or OpenStack Swift for persistence. Those solutions however lack many features of a real filesystem and suffer from low performance due to overhead.

Daniel Bäurer and Sascha Askani share a solution using Spark on Kubernetes with Quobyte as an advanced, distributed, software defined storage system to deliver elastic and stable Spark performance in a container environment.

Photo of Daniel Bäurer

Daniel Bäurer

inovex GmbH

Daniel Bäurer is head of operations at inovex GmbH. Daniel has been designing and operating complex systems for over 15 years. He currently focuses on data center automation and Hadoop platforms.

Photo of Sascha Askani

Sascha Askani

inovex GmbH

Sascha Askani is a senior systems engineer at inovex GmbH. Sascha has a strong storage and disaster recovery background and has helped various customers master their digital transformation challenges. He now focuses on solutions for his customers’ big data needs, with an emphasis on distributed storage solutions.