Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Continuous integration at scale: Streaming 50 billion events per day for real-time feedback with Kafka and Spark (sponsored by Pure Storage)

Ivan Jibaja (Pure Storage)
11:20am12:00pm Thursday, September 28, 2017
Location: 1E 06

What you'll learn

  • Explore Pure Storage's streaming big data analytics pipeline


Pure Storage runs a lean engineering workforce, with only 5% of the team dedicated to QA. As a result, the company has invested heavily in automated testing for a continuous integration and release cycle. As the products and teams grow, the number of tests has exploded, requiring an automated solution to prioritize, classify, and understand failure root causes.

Ivan Jibaja explains offers an overview of Pure Storage’s streaming big data analytics pipeline, which uses open source technologies like Spark and Kafka to process over 30 billion events per day and provide real-time feedback in under five seconds. This pipeline is supported by Pure Storage’s FlashBlade as a shared storage solution, which enables a streaming use case as well as on-demand batch analytics. Ivan explores the use case for big data analytics technologies, the lessons learned from this project, and the underlying elastic infrastructure that provides flexibile scaling, agility, and simplicity across multiple application clusters.

This session is sponsored by PureStorage.

Photo of Ivan Jibaja

Ivan Jibaja

Pure Storage

Ivan Jibaja is a tech lead for the big data analytics team at Pure Storage. Previously, he was a part of the core development team that built the FlashBlade from the ground up. Ivan holds a PhD in computer science with a focus on systems and compilers from the University of Texas at Austin.