Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Enabling new streaming applications with Apache Flink

Stephan Ewen (data Artisans), Kostas Tzoumas (data Artisans)
14:55–15:35 Friday, 3/06/2016
IoT & real-time
Location: Capital Suite 12 Level: Intermediate
Tags: real-time, iot
Average rating: ****.
(4.67, 3 ratings)

Prerequisite knowledge

Attendees should have a basic familiarity with the applications of data analysis as well as some knowledge of the Hadoop ecosystem.

Description

Data streaming is emerging as a new and increasingly popular architectural pattern for data infrastructure. Data streaming architectures embrace the fact that data in practice is not static but is continuously produced in the form of events. Streaming technology, such as Apache Flink, one of the most popular stream processing frameworks in the Apache Software Foundation, follows this philosophy to the end: applications work directly on the streams of events and on the isolated local state that aggregates the event histories. Among the many disruptive promises of streaming architectures backed by modern streaming systems such as Apache Flink are:

  • Decreased latency from signal to decision: data can be analyzed in real time and before ingestion;
  • A unified way of handling real-time and historic data processing: the same program can be used to analyze the historical data (via “time travel” queries) and the real-time data;
  • Simple versioning of applications and their state (via consistent checkpoints of state built into the framework);
  • Simplification of the data processing stack, obviating the need for complex pipelines from ingestion to analytics with artificial data boundaries (batches).

Stephan Ewen and Kostas Tzoumas introduce the data streaming architecture paradigm, outline the building blocks of data streaming applications—including event streams, transformations and windows, different notions of time and how to handle those, and how to keep application state consistent—and show how to build an exemplary set of simple but representative applications using Apache Flink.

Photo of Stephan Ewen

Stephan Ewen

data Artisans

Stephan Ewen is one of the originators and committers of the Apache Flink project and CTO at data Artisans, leading the development of large-scale data stream processing technology. Stephan coauthored the Stratosphere system and has worked on data processing technologies at IBM and Microsoft. Stephan holds a PhD from the Berlin University of Technology.

Photo of Kostas Tzoumas

Kostas Tzoumas

data Artisans

Kostas Tzoumas is a PMC member of the Apache Flink project and cofounder of data Artisans, the company founded by the original development team that created Flink. Kostas has spoken extensively about Flink, including at Hadoop Summit San Jose 2015.