Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

What no one tells you about writing a streaming app?

Mark Grover (Cloudera), Ted Malaska (Blizzard Entertainment)
14:0514:45 Thursday, 25 May 2017
Level: Advanced
Average rating: ****.
(4.00, 4 ratings)

Who is this presentation for?

  • Architects and developers

Prerequisite knowledge

  • Basic knowledge of streaming engines, such as Spark Streaming

What you'll learn

  • Understand how to write a solid streaming app

Description

Any nontrivial streaming app requires that you consider a number of important topics, but questions like how to manage offsets or state often go unanswered. Mark Grover and Ted Malaska share practices that no one talks about when you start writing a streaming app but that you’ll inevitably need to learn along the way.

Topics include:

  • How do I manage offsets?
  • How do I manage state?
  • How do I make my Spark Streaming job resilient to failures? Can I avoid some failures?
  • How do I gracefully shut down my streaming job?
  • How do I monitor and manage (e.g., retry logic) my streaming job?
  • How can I better manage the DAG in my streaming job?
  • When to use checkpointing and for what? When not to use checkpointing?
  • Do I need a WAL when using streaming data source? Why? When don’t I need one?
Photo of Mark Grover

Mark Grover

Cloudera

Mark Grover is a software engineer working on Apache Spark at Cloudera. Mark is a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating) and a committer and PMC member on Apache Sentry and has contributed to a number of open source projects including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He is a coauthor of Hadoop Application Architectures and also wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data at various national and international conference. He occasionally blogs on topics related to technology.

Photo of Ted Malaska

Ted Malaska

Blizzard Entertainment

Ted Malaska is a group technical architect on the Battle.net team at Blizzard, helping support great titles like World of Warcraft, Overwatch, and HearthStone. Previously, Ted was a principal solutions architect at Cloudera helping clients find success with the Hadoop ecosystem and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Mark Grover
Mark Grover | SOFTWARE ENGINEER
5/06/2017 20:26 BST

http://tiny.cloudera.com/streaming-app

Binyamin Bazomnik | C4I OFFICER
5/06/2017 19:59 BST

Very interesting, can you share the slides?

Thanks.

Aitezaz Sheikh | SENIOR DEVELOPER
26/05/2017 15:50 BST

Hi,

Where can I find the slides for this presentation?

Kind regards