Continuous integration (CI) pipelines generate massive amounts of messy log data. Pure Storage engineering runs over 70,000 tests per day creating a large triage problem that would require at least 20 triage engineers. Instead, Spark’s flexible computing platform allows the company to write a single application for both streaming and batch jobs so that a team of only three triage engineers can understand the state of the company’s CI pipeline. Spark indexes log data for real-time reporting (streaming), uses machine learning for performance modeling and prediction (batch job), and reindexes old data for newly encoded patterns (batch job). Ivan Jibaja discusses the use case for big data analytics technologies, the architecture of the solution, and lessons learned.
This session is sponsored by Pure Storage.
Ivan Jibaja is a tech lead for the big data analytics team at Pure Storage. Previously, he was a part of the core development team that built the FlashBlade from the ground up. Ivan holds a PhD in computer science with a focus on systems and compilers from the University of Texas at Austin.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com