Apache NiFi has seen it all. (It worked for the NSA after all.) What it brings to the Hadoop ecosystem is a series of data flow and ingest patterns, a GUI, and a lot of security and record-level data provenance. Simon Elliston Ball offers an overview of Apache NiFi and explores its innovations around content and provenance repositories, focusing on how NiFi achieves what it does in terms of throughput and performance and the internal data structures and code that allow you to make the trade-off between latency and throughput or resilience and speed in real time. Simon also covers in depth some of the key processors that make up NiFi data flows and examines the clues they leave to writing high-performance data flows on top of the NiFi framework.
Simon Elliston Ball is a solutions engineer at Hortonworks, where he helps clients do Hadoop. Simon is a certified Spark and Hadoop developer. Previously, he worked in the data-intensive worlds of hedge funds and financial trading, ERP, and ecommerce, as well as designing and running nationwide networks and websites. Over the course of those roles, he designed and built several organization-wide data and networking infrastructures, headed up research and development teams, and designed (and implemented) numerous digital products and high-traffic transactional websites. For a change of technical pace, Simon writes and produces screencasts on frontend web technologies and performance and is an avid Node.js programmer.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.