Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Stream scaling in Pravega

Flavio Junqueira (Dell EMC)

16:35–17:15 Thursday, 24 May 2018

Big data and data science in the cloud, Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 8/9 Level: Intermediate

Who is this presentation for?

Data engineers, architects, and decision makers in the area of data infrastructure and data-driven applications

Prerequisite knowledge

A basic understanding of distributed system and messaging system concepts

What you'll learn

Learn how to build an elastic end-to-end stream pipeline with Pravega
Understand the guarantees Pravega provides in the presence of scaling and how applications need to code against it and how Pravega implements stream scaling

Description

Stream processing consists of ingesting and processing continuously generated data, often from end users in web applications or from more challenging settings where devices such as servers and sensors generate events at a high rate. Such scenarios often demand the use of a software stack that is able to scale and accommodate changes to the characteristics of the application.

One of the major challenges with processing data streams is adapting to workload variations (e.g., due to daily cycles or the growth of the population of sources). Systems to ingest stream data typically parallelize it by sharding the incoming messages and events according to a routing key. Having the ability to parallelize ingestion is very effective, but future changes to the workload (which are very often unknown beforehand) might make the initial choice for the degree of parallelism inadequate for even short-term spikes. Consequently, the ability to scale by adapting parallelism according to workload while preserving important API properties, such as per-key order, is highly desirable to handle mission-critical workloads.

Flavio Junqueira explains how to accommodate changes to workloads in and with Pravega, an open source stream store built to ingest and serve stream data. Pravega primarily manipulates and stores segments (append-only byte sequences), forming streams by creating and composing segments, which it uses to enable the scaling of streams. Stream scaling in Pravega is automatic and transparent to the application, but such a change to the ingestion volume might also require the application to follow and scale its resources downstream (e.g., the operators of an Apache Flink job) to accommodate the new ingestion volume. Pravega signals such changes to the application so that it can react accordingly. The cooperation between Pravega and the downstream application is crucial for building an effective stream data pipeline.

Flavio Junqueira

Dell EMC

Flavio Junqueira is senior director of software engineering at Dell EMC, where he leads the Pravega team. He is interested in various aspects of distributed systems, including distributed algorithms, concurrency, and scalability. Previously, Flavio held an engineering position with Confluent and research positions with Yahoo Research and Microsoft Research. He is an active contributor to Apache projects, including Apache ZooKeeper (as PMC and committer), Apache BookKeeper (as PMC and committer), and Apache Kafka. Flavio coauthored the O’Reilly ZooKeeper book. He holds a PhD in computer science from the University of California, San Diego.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com