Kubernetes for Stateful MPP systems
Who is this presentation for?Software Engineers, Architects, Product Managers
Prerequisite knowledgeA general familiarity with software development.
What you'll learn
a. Containers de-couple applications from the underlying infrastructure. With the advent of low-cost public infrastructure providers such as Amazon, Google, etc., many applications are now being modified to run inside containers to enable simpler and faster deployment on any platform. Containers with the aid of deployment tools such as Kubernetes also enable applications to scale quickly on clouds.
De-coupling distributed databases from the underlying infrastructure would provide many benefits. You could run analytics on any hardware at scale, for instance. K8 could also make recoverability on cloud deployments automatic, making applications far more resilient.
However, Kubernetes started out only supporting applications that could be decomposed into micro-services, which are independent and stateless.
Spikes in demand hit database users hard, and node failures can bog down whole clusters without proper recovery. GoodData, for example, saw that node failures on the cloud could affect their Vertica MPP database, which caused a reduction in customer satisfaction.
The Vertica R & D team set out to find a way to make failure handling seamless and node recovery automatic.
Kubernetes was the obvious choice, but K8 is traditionally used for micro-services, not something like a stateful MPP database that might need hundreds of containers. In order to merge the power of an MPP analytics database with the flexibility of Kubernetes, a lot of hurdles had to be overcome.
In this talk, you will learn the challenges with networking, storage, and operational complexity we encountered while extending a stateful distributed database system to work with containers and Kubernetes. We will also describe one implementation, Gooddata that overcomes these challenges, and serves as a practical example of how this can work.
This presentation will explore some of the mistakes we made, and lessons we learned along the way to save you from having to make the same mistakes when incorporating Kubernetes into your software architecture.
In two decades in the data management industry, I have worked as an engineer, a trainer, a marketer, a product manager, and a consultant. Now, I promote understanding of Vertica, MPP data processing, open source, and how the analytics revolution is changing the world.
Deepak Majeti is a systems software engineer at Vertica. He is also an active contributor to Hadoop’s two most popular file formats: ORC and Parquet. His interests lie in getting the best from high-performance computing (HPC) and big data by building scalable, high-performance, and energy-efficient data analytics tools for modern computer architectures. Deepak holds a PhD in the HPC domain from Rice University.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts