Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

High-performance data flow with a GUI—and guts

Simon Elliston Ball (Cloudera)
16:35–17:15 Thursday, 2/06/2016
IoT & real-time
Location: Capital Suite 12 Level: Advanced
Average rating: ****.
(4.38, 8 ratings)

Prerequisite knowledge

Attendees should have some knowledge of distributed messaging frameworks and Java.


Apache NiFi has seen it all. (It worked for the NSA after all.) What it brings to the Hadoop ecosystem is a series of data flow and ingest patterns, a GUI, and a lot of security and record-level data provenance. Simon Elliston Ball offers an overview of Apache NiFi and explores its innovations around content and provenance repositories, focusing on how NiFi achieves what it does in terms of throughput and performance and the internal data structures and code that allow you to make the trade-off between latency and throughput or resilience and speed in real time. Simon also covers in depth some of the key processors that make up NiFi data flows and examines the clues they leave to writing high-performance data flows on top of the NiFi framework.

Photo of Simon Elliston Ball

Simon Elliston Ball


Simon Elliston Ball is a solutions engineer at Hortonworks, where he helps clients do Hadoop. Simon is a certified Spark and Hadoop developer. Previously, he worked in the data-intensive worlds of hedge funds and financial trading, ERP, and ecommerce, as well as designing and running nationwide networks and websites. Over the course of those roles, he designed and built several organization-wide data and networking infrastructures, headed up research and development teams, and designed (and implemented) numerous digital products and high-traffic transactional websites. For a change of technical pace, Simon writes and produces screencasts on frontend web technologies and performance and is an avid Node.js programmer.