Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Top five mistakes when writing streaming applications

Ted Malaska (Capital One)
11:15am11:55am Wednesday, December 6, 2017
Average rating: ****.
(4.67, 9 ratings)

Who is this presentation for?

  • Software engineers, tech leads, and architects

Prerequisite knowledge

  • A basic understanding of the streaming ecosystem

What you'll learn

  • Learn best practices for writing streaming applications


Ted Malaska shares the top five mistakes that no one talks about when you start writing your streaming app along with the practices you’ll inevitably need to learn along the way.

Topics include:

  • How do I manage offsets?
  • How do I manage state?
  • How do I make my Spark streaming job resilient to failures? Can I avoid some failures?
  • How do I gracefully shut down my streaming job?
  • How do I monitor and manage (e.g., retry logic) my streaming job?
  • How can I better manage the DAG in my streaming job?
  • When do I use checkpointing and for what? When shouldn’t I use checkpointing?
  • Do I need a WAL when using a streaming data source? Why? When don’t I need one?
Photo of Ted Malaska

Ted Malaska

Capital One

Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.