Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Introducing Apache Flink: Fast and reliable data analytics in clusters

Stephan Ewen (data Artisans)
17:05–17:45 Thursday, 7/05/2015
Hadoop & Beyond
Location: Buckingham Room - Palace Suite
Average rating: ****.
(4.20, 5 ratings)

Prerequisite Knowledge

none

Description

Apache Flink (http://flink.incubator.apache.org) is an open source project undergoing incubation in the Apache Software Foundation. Flink creates a data analysis engine that is designed to match Hadoop in reliability and Spark in performance.

The project pushes the technology forward in many ways: Flink is compatible with the Hadoop ecosystem and runs on top of HDFS and YARN. Flink’s programs are not executed directly but are optimized by Flink’s cost-based optimizer similarly to what SQL engines do for relational algebra programs. This means that Flink applications require little (re-)configuration and little maintenance when the cluster characteristics change and the data evolves over time.

Flink’s runtime implements a unique approach to memory management, using in-memory execution as much as possible and very gracefully degrading to disk-based execution when memory is not enough. Flink introduces native closed-loop iteration operators, making graph analysis and machine learning applications very fast on the platform.

Finally, Flink’s runtime is a true data streaming engine, unifying batch processing and true stream processing in a single system. Flink is an active open source project with more than 70 contributors from industry and academia.

Photo of Stephan Ewen

Stephan Ewen

data Artisans

Stephan Ewen is one of the originators and committers of the Apache Flink project, and is a CTO at a Berlin-based startup where he leads the effort to create a novel distributed system for reliable large-scale data processing.

Stephan holds a Ph.D. from the Berlin University of Technology, and is a co-author of the Stratosphere system. He has worked on data processing technologies at IBM and Microsoft in the past.