Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

What no one tells you about writing a streaming app?

Mark Grover (Lyft), Ted Malaska (Capital One)
14:0514:45 Thursday, 25 May 2017
Level: Advanced
Average rating: ****.
(4.00, 4 ratings)

Who is this presentation for?

  • Architects and developers

Prerequisite knowledge

  • Basic knowledge of streaming engines, such as Spark Streaming

What you'll learn

  • Understand how to write a solid streaming app


Any nontrivial streaming app requires that you consider a number of important topics, but questions like how to manage offsets or state often go unanswered. Mark Grover and Ted Malaska share practices that no one talks about when you start writing a streaming app but that you’ll inevitably need to learn along the way.

Topics include:

  • How do I manage offsets?
  • How do I manage state?
  • How do I make my Spark Streaming job resilient to failures? Can I avoid some failures?
  • How do I gracefully shut down my streaming job?
  • How do I monitor and manage (e.g., retry logic) my streaming job?
  • How can I better manage the DAG in my streaming job?
  • When to use checkpointing and for what? When not to use checkpointing?
  • Do I need a WAL when using streaming data source? Why? When don’t I need one?
Photo of Mark Grover

Mark Grover


Mark Grover is a product manager at Lyft. Mark’s a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data. He occasionally blogs on topics related to technology.

Photo of Ted Malaska

Ted Malaska

Capital One

Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.

Comments on this page are now closed.


Picture of Mark Grover
5/06/2017 20:26 BST

Binyamin Bazomnik | C4I OFFICER
5/06/2017 19:59 BST

Very interesting, can you share the slides?


26/05/2017 15:50 BST


Where can I find the slides for this presentation?

Kind regards