Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Making stateless containers reliable and available even with stateful applications

Paul Curtis (Weaveworks)
16:3517:15 Wednesday, 23 May 2018
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Enterprise architects, systems managers, and system administrators

Prerequisite knowledge

  • Familiarity with container strategies and tools, such as Docker, Kubernetes, OpenShift, and Mesos

What you'll learn

  • Learn how to handle persistent data in large big data environments across multiple data centers or geographic locations


Persisting large amounts of state information inside a container is a classic way to make a container system work poorly. The problem is that container systems assume that containers are ephemeral—that it’s OK to kill one if resource conflicts come up and that they can be restarted quickly on pretty much any available host. Lots of state inside the container that has to be reconstituted on restart makes it impossible to restart quickly. The common wisdom is that this state has to move outside the container. (A related problem occurs when containers are run in data centers that are geographically distributed. In this situation, the problem of simply storing the state is compounded with the problem of geographically distributing this state.)

One commonly attempted solution is to use some sort of storage appliance, such as a NAS, but container-based applications can drive more than enough traffic to paralyze conventional file storage systems precisely because they are easy to scale. Conventional storage also doesn’t really help the problem of geodistribution. Nor does it deal with the fact that containerized applications often need a message streaming service (such as Apache Kafka) or a table of some kind (such as provided by Apache Cassandra or Apache HBase). Putting Cassandra or Kafka or HDFS into containers doesn’t help because that just makes more stateful containers.

The requirements posed by large-scale containerized systems aren’t trivial, but they definitely aren’t impossible. What is needed is a scalable persistence layer that supports multiple types of storage and multiple APIs to which containers have convenient and reliable access, whether deployed on-premises, in the cloud, or in a hybrid system.

Paul Curtis demonstrates how a scalable data store that provides traditional updateable file semantics along with message streaming and NoSQL table APIs can solve these problems and allow very simple containerized application development. You’ll learn how to make your containerized applications, including stateful applications and legacy code, reliable and available in highly scalable systems locally and globally.

Photo of Paul Curtis

Paul Curtis


Principal Solutions Architect at Weaveworks, previously a Principal Engineer at MapR.