Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Continuous integration at scale: Streaming 50 billion events per day for real-time feedback with Kafka and Spark (sponsored by Pure Storage)

Ivan Jibaja (Pure Storage)
11:20am12:00pm Thursday, September 28, 2017
Sponsored
Location: 1E 06

What you'll learn

  • Explore Pure Storage's streaming big data analytics pipeline

Description

Pure Storage runs a lean engineering workforce, with only 5% of the team dedicated to QA. As a result, the company has invested heavily in automated testing for a continuous integration and release cycle. As the products and teams grow, the number of tests has exploded, requiring an automated solution to prioritize, classify, and understand failure root causes.

Ivan Jibaja explains offers an overview of Pure Storage’s streaming big data analytics pipeline, which uses open source technologies like Spark and Kafka to process over 30 billion events per day and provide real-time feedback in under five seconds. This pipeline is supported by Pure Storage’s FlashBlade as a shared storage solution, which enables a streaming use case as well as on-demand batch analytics. Ivan explores the use case for big data analytics technologies, the lessons learned from this project, and the underlying elastic infrastructure that provides flexibile scaling, agility, and simplicity across multiple application clusters.

This session is sponsored by PureStorage.

Photo of Ivan Jibaja

Ivan Jibaja

Pure Storage

Ivan Jibaja is a FlashBlade engineer at Pure Storage, where he leads the team building a big data analytics pipeline for streaming telemetry data from Pure Storage’s testing infrastructure to classify, prioritize, and understand root causes of bugs in the software development cycle. Ivan holds a PhD in computer science from the University of Texas at Austin with a concentration in compilers and programming languages.