Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Making stateless containers reliable and available even with stateful applications

Paul Curtis (Weaveworks)

16:35–17:15 Wednesday, 23 May 2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Intermediate

Average rating:

(4.00, 2 ratings)

Who is this presentation for?

Enterprise architects, systems managers, and system administrators

Prerequisite knowledge

Familiarity with container strategies and tools, such as Docker, Kubernetes, OpenShift, and Mesos

What you'll learn

Learn how to handle persistent data in large big data environments across multiple data centers or geographic locations

Description

Persisting large amounts of state information inside a container is a classic way to make a container system work poorly. The problem is that container systems assume that containers are ephemeral—that it’s OK to kill one if resource conflicts come up and that they can be restarted quickly on pretty much any available host. Lots of state inside the container that has to be reconstituted on restart makes it impossible to restart quickly. The common wisdom is that this state has to move outside the container. (A related problem occurs when containers are run in data centers that are geographically distributed. In this situation, the problem of simply storing the state is compounded with the problem of geographically distributing this state.)

One commonly attempted solution is to use some sort of storage appliance, such as a NAS, but container-based applications can drive more than enough traffic to paralyze conventional file storage systems precisely because they are easy to scale. Conventional storage also doesn’t really help the problem of geodistribution. Nor does it deal with the fact that containerized applications often need a message streaming service (such as Apache Kafka) or a table of some kind (such as provided by Apache Cassandra or Apache HBase). Putting Cassandra or Kafka or HDFS into containers doesn’t help because that just makes more stateful containers.

The requirements posed by large-scale containerized systems aren’t trivial, but they definitely aren’t impossible. What is needed is a scalable persistence layer that supports multiple types of storage and multiple APIs to which containers have convenient and reliable access, whether deployed on-premises, in the cloud, or in a hybrid system.

Paul Curtis demonstrates how a scalable data store that provides traditional updateable file semantics along with message streaming and NoSQL table APIs can solve these problems and allow very simple containerized application development. You’ll learn how to make your containerized applications, including stateful applications and legacy code, reliable and available in highly scalable systems locally and globally.

Paul Curtis

Weaveworks

Principal Solutions Architect at Weaveworks, previously a Principal Engineer at MapR.

Website

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com