Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

What no one tells you about writing a streaming app?

Mark Grover (Cloudera), Ted Malaska (Blizzard)
14:0514:45 Thursday, 25 May 2017
Level: Advanced
Average rating: ****.
(4.00, 4 ratings)

Who is this presentation for?

  • Architects and developers

Prerequisite knowledge

  • Basic knowledge of streaming engines, such as Spark Streaming

What you'll learn

  • Understand how to write a solid streaming app

Description

Any nontrivial streaming app requires that you consider a number of important topics, but questions like how to manage offsets or state often go unanswered. Mark Grover and Ted Malaska share practices that no one talks about when you start writing a streaming app but that you’ll inevitably need to learn along the way.

Topics include:

  • How do I manage offsets?
  • How do I manage state?
  • How do I make my Spark Streaming job resilient to failures? Can I avoid some failures?
  • How do I gracefully shut down my streaming job?
  • How do I monitor and manage (e.g., retry logic) my streaming job?
  • How can I better manage the DAG in my streaming job?
  • When to use checkpointing and for what? When not to use checkpointing?
  • Do I need a WAL when using streaming data source? Why? When don’t I need one?
Photo of Mark Grover

Mark Grover

Cloudera

Mark Grover is a software engineer working on Apache Spark at Cloudera. Mark is a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating) and a committer and PMC member on Apache Sentry and has contributed to a number of open source projects including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He is a coauthor of Hadoop Application Architectures and also wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data at various national and international conference. He occasionally blogs on topics related to technology.

Photo of Ted Malaska

Ted Malaska

Blizzard

Ted Malaska is a senior solution architect at Blizzard. Previously, he was a principal solutions architect at Cloudera. Ted has 18 years of professional experience working for startups, the US government, some of the world’s largest banks, commercial firms, bio firms, retail firms, hardware appliance firms, and the largest nonprofit financial regulator in the US and has worked on close to one hundred clusters for over two dozen clients with over hundreds of use cases. He has architecture experience across topics including Hadoop, Web 2.0, mobile, SOA (ESB, BPM), and big data. Ted is a regular contributor to the Hadoop, HBase, and Spark projects, a regular committer to Flume, Avro, Pig, and YARN, and the coauthor of Hadoop Application Architectures.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Mark Grover
Mark Grover | SOFTWARE ENGINEER
5/06/2017 20:26 BST

http://tiny.cloudera.com/streaming-app

Binyamin Bazomnik | C4I OFFICER
5/06/2017 19:59 BST

Very interesting, can you share the slides?

Thanks.

Aitezaz Sheikh | SENIOR DEVELOPER
26/05/2017 15:50 BST

Hi,

Where can I find the slides for this presentation?

Kind regards