Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Working within the Hadoop ecosystem to build a live-streaming data pipeline

Stephen Devine (Big Fish Games), kalah brown (Big Fish Games)
4:35pm5:15pm Wednesday, September 27, 2017
Secondary topics:  Architecture, Platform, Streaming

Who is this presentation for?

  • Software and data engineers

Prerequisite knowledge

  • A basic understanding of messaging frameworks, Spark applications development, and stream processing

What you'll learn

  • Learn how Big Fish Games used Spark, Flume, and Kafka to build a live-streaming data pipeline


Big Fish Games is a leading producer and distributor of casual and midcore games. The company has created a mobile game platform that supports real-time data analytics and other business processes. The platform processes on average 5,000 events per second—a number that is continually growing. While the Hadoop ecosystem includes platforms and software library frameworks that can support this goal, not all components are appropriate within a given architecture, and other components require performance tuning and customization.

Stephen Devine and Kalah Brown explain how they used Spark, Flume, and Kafka to build a live-streaming data pipeline, focusing on their decision to move from Flume to Kafka. Along the way, they cover Spark Streaming Check Pointing, its limitations, and how they overcame them by building a custom fault tolerance feature in their Spark consumer. Stephen and Kalah also address how they removed performance bottlenecks by using Scala futures to asynchronously write to HDFS and share how to more efficiently tune Spark Streaming by discussing which of its properties and configurations they found most impactful for improving performance.

Photo of Stephen Devine

Stephen Devine

Big Fish Games

Stephen Devine is a Seattle-based data engineer at Big Fish Games, where he wrangles events sent from millions of mobile phones through Kafka into Hive. Previously, he did similar things for Xbox One Live Services using proprietary Microsoft technology and worked on several releases of Internet Explorer at Microsoft.

Photo of kalah brown

kalah brown

Big Fish Games

Kalah Brown is a senior Hadoop engineer at Big Fish Games, where she is responsible for the technical leadership and development of big data solutions. Previously, Kalah was a consultant in the greater Seattle area and worked with numerous companies, including Disney, Starbucks, the Bill and Melinda Gates Foundation, Microsoft, and Premera Blue Cross. She has 17 years of experience in software development, data warehousing, and business intelligence.