Sep 23–26, 2019
Please log in

Kubernetes for stateful MPP systems

Paige Roberts (Vertica), Deepak Majeti (Vertica)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1E 07/08
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Software engineers, architects, and product managers




Containers decouple applications from the underlying infrastructure. With the advent of low-cost public infrastructure providers such as Amazon, Google, etc., many applications are now being modified to run inside containers to enable simpler and faster deployment on any platform. Containers with the aid of deployment tools such as Kubernetes also enable applications to scale quickly on clouds. Decoupling distributed databases from the underlying infrastructure would provide many benefits, including running analytics on any hardware at scale. Kubernetes could also make recoverability on cloud deployments automatic, making applications far more resilient.

However, Kubernetes started out only supporting applications that could be decomposed into microservices, which are independent and stateless. Spikes in demand hit database users hard, and node failures can bog down whole clusters without proper recovery. GoodData, for example, saw that node failures on the cloud could affect its Vertica MPP database, which caused a reduction in customer satisfaction. The Vertica R&D team set out to find a way to make failure handling seamless and node recovery automatic.

Kubernetes was the obvious choice, but Kubernetes is traditionally used for microservices, not something like a stateful MPP database that might need hundreds of containers. In order to merge the power of an MPP analytics database with the flexibility of Kubernetes, a lot of hurdles had to be overcome. Paige Roberts and Deepak Majeti detail the challenges with networking, storage, and operational complexity encountered while extending a stateful distributed database system to work with containers and Kubernetes. They also explore one implementation used at GoodData that overcomes these challenges and serves as a practical example of how this can work.

You’ll hear some mistakes that were made and lessons that were learned along the way to save you from having to make the same mistakes when incorporating Kubernetes into your software architecture.

Prerequisite knowledge

  • Familiarity with software development

What you'll learn

  • Learn why you would want to and how you would go about putting a large stateful application into containers and Kubernetes
  • Discover what specific strategies will benefit you and what strategies will cause you serious problems further down the road
Photo of Paige Roberts

Paige Roberts


Paige Roberts is an open source relations manager at Vertica, where she promotes understanding of the company, MPP data processing, open source, high-scale data engineering, and how the analytics revolution is changing the world. In 23 years in the data management industry, Paige has worked as an engineer, a trainer, a support technician, a technical writer, a marketer, a product manager, and a consultant. She’s built data engineering pipelines and architectures, documented and tested open source analytics implementations, spun up Hadoop clusters, picked the brains of stars in data analytics and engineering, worked with a lot of different industries, and questioned a lot of assumptions.

Photo of Deepak Majeti

Deepak Majeti


Deepak Majeti is a systems software engineer at Vertica. He’s also an active contributor to Hadoop’s two most popular file formats: ORC and Parquet. His interests lie in getting the best from high-performance computing (HPC) and big data by building scalable, high-performance, and energy-efficient data analytics tools for modern computer architectures. Deepak holds a PhD in the HPC domain from Rice University.

Comments on this page are now closed.


Arnon Shimoni |
10/10/2019 4:10am EDT

Is it possible to get a recording or slides for this talk?

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires