There are many challenges when deploying distributed applications on containers. One of the biggest is the lack of a stable and performant distributed filesystem. HDFS works very well with legacy Hadoop installations on commodity hardware in classic IT environments since it is very cheap to store a large amount of data on your compute nodes (data locality), but cloud-native environments do not allow HDFS to play out its advantages. Data locality on compute nodes, for example, stands contrary to the idea behind containers or cloud infrastructures. For this reason, many cloud-first implementations fall back to object stores like Amazon S3, Google Cloud Storage, or OpenStack Swift for persistence. Those solutions however lack many features of a real filesystem and suffer from low performance due to overhead.
Daniel Bäurer and Sascha Askani share a solution using Spark on Kubernetes with Quobyte as an advanced, distributed, software defined storage system to deliver elastic and stable Spark performance in a container environment.
Daniel Bäurer is head of operations at inovex GmbH. Daniel has been designing and operating complex systems for over 15 years. He currently focuses on data center automation and Hadoop platforms.
Sascha Askani is a senior systems engineer at inovex GmbH. Sascha has a strong storage and disaster recovery background and has helped various customers master their digital transformation challenges. He now focuses on solutions for his customers’ big data needs, with an emphasis on distributed storage solutions.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com