Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Making Stateless Containers Reliable and Available Even with Stateful Applications

Paul Curtis (MapR Technologies)
16:3517:15 Wednesday, 23 May 2018

Who is this presentation for?

Enterprise Architects, Systems Manager, System Administrators

Prerequisite knowledge

Container strategies and tools: Docker, Kubernetes, OpenShift, Mesos.

What you'll learn

How to handle persistent data in large big data environments, and the ability to accomplish this across multiple data centers or geographic locations.

Description

Persisting large amounts of state information inside a container is a classic way to make a container system work poorly. The problem is that container systems assume that containers are ephemeral, that it’s OK to kill one if resource conflicts come up and that they can be restarted quickly on pretty much any available host. Lots of state inside the container that has to be reconstituted on restart makes it impossible to restart quickly. The common wisdom is that this state has to move outside the container.

A related problem occurs when containers are run in data centers that are geographically distributed. In this situation, the problem of simply storing the state is compounded with the problem of geographically distributing this state.

One commonly attempted solution is to use some sort of storage appliance such as a NAS. But container-based applications can drive more than enough traffic to paralyze conventional file storage systems precisely because they are easy to scale. Conventional storage also doesn’t really help the problem of geo-distribution. Nor does it deal with the fact that containerized applications often need a message streaming service (such as Apache Kafka) or a table of some kind (such as provided by Apache Cassandra or Apache HBase). Putting Cassandra or Kafka or HDFS into containers doesn’t help because that just makes more stateful containers.

The requirements posed by large-scale containerized systems aren’t trivial, but they definitely aren’t impossible, either. What is needed is a scalable persistence layer that supports multiple types of storage and multiple APIs to which containers have convenient and reliable access, whether deployed on premise, in the cloud or in a hybrid system of on-premise and cloud. In this talk I will show how a scalable data store that provides traditional updateable file semantics along with message streaming and NoSQL table API’s can solve these problems and allow very simple containerized application development. The audience will learn how to make containerized applications – including stateful applications and legacy code – be reliable and available in highly scalable systems locally and globally.

Photo of Paul Curtis

Paul Curtis

MapR Technologies

Paul Curtis is a principal solutions engineer at MapR, where he provides pre- and postsales technical support to MapR’s worldwide systems engineering team. Previously, Paul served as senior operations engineer for Unami, a startup founded to deliver on the promise of interactive TV for consumers, networks, and advertisers; systems manager for Spiral Universe, a company providing school administration software as a service; senior support engineer positions at Sun Microsystems; enterprise account technical management positions for both Netscape and FileNet; and roles in application development at Applix, IBM Service Bureau, and Ticketron. Paul got started in the ancient personal computing days; he began his first full-time programming job on the day the IBM PC was introduced.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)