Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams

Dean Wampler (Anyscale), Boris Lublinsky (Lightbend)

1:30pm–5:00pm Tuesday, March 6, 2018

Data engineering and architecture, Streaming systems and real-time applications
Location: 210 C/G

Average rating:

(3.50, 2 ratings)

Download slides (PDF)

Who is this presentation for?

Data engineers and architects

Prerequisite knowledge

Programming experience, preferably with Java or Scala
A working knowledge of Kafka (useful but not required)

Materials or downloads needed in advance

**BEFORE** you arrive onsite for the tutorial, setup your laptop with the tutorial content by following the instructions at this GitHub repo: https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial

What you'll learn

Learn how to combine Kafka with Akka Streams and Kafka Streams to implement various streaming scenarios that leverage the strengths of these tools while avoiding their weaknesses and how they compare to Spark Streaming and Flink

Description

If you’re building streaming data apps, your first inclination might be to reach for Spark Streaming, Flink, Apex, or similar tools, which run as services to which you submit jobs for execution. But sometimes, writing conventional microservices with embedded stream processing is a better fit for your needs.

Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics. With powerful stream and table abstractions and an exactly once capability, it supports a variety of common scenarios involving transformation, filtering, and aggregation. Akka Streams emerged as a dataflow-centric abstraction for the Akka Actors model, designed for general-purpose microservices, especially when per-event low-latency is important. Most systems provide efficient processing amortized over sets of records, but usually not at end-to-end low latency per event (e.g., for complex event processing in true real-time applications). Also because of its general-purpose nature, Akka Streams supports a wider class of application problems and third-party integrations but is less focused on Kafka-based applications. Both are primarily libraries that you integrate into your microservices, which means you must manage their lifecycles yourself, but you also get lots of flexibility to do this as you see fit.

In contrast, Spark Streaming and Flink run their own services. You write “jobs” or use interactive shells that tell these services what computations to do over data sources and where to send results. Spark and Flink then determine what processes to run in your cluster to implement the dataflows. Hence, there is less of a DevOps burden to bear but also less flexibility when you might need it. Both systems are also more focused on data analytics problems, with various levels of support for SQL over streams, machine learning model training and scoring, etc.

Join Dean Wampler and Boris Lublinsky to learn how to build two microservice streaming applications based on Kafka using Akka Streams and Kafka Streams for data processing. You’ll explore the strengths and weaknesses of each tool for particular design needs and contrast them with Spark Streaming and Flink, so you’ll know when to choose them instead. You’ll be given an execution environment and the code examples in a GitHub repo, and Dean and Boris will walk you through the examples, interspersed with short presentations, helping you understand their strengths, weaknesses, performance characteristics, and lifecycle management requirements.

Dean Wampler

Anyscale

Dean Wampler is an expert in streaming data systems, focusing on applications of machine learning and artificial intelligence (ML/AI). He’s head of developer relations at Anyscale, which is developing Ray for distributed Python, primarily for ML/AI. Previously, he was an engineering VP at Lightbend, where he led the development of Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala, and Functional Programming for Java Developers, and he’s the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent conference speaker and tutorial teacher, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He earned his PhD in physics from the University of Washington.

Website

Boris Lublinsky

Lightbend

Boris Lublinsky is a principal architect at Lightbend, where he specializes in big data, stream processing, and services. Boris has over 30 years’ experience in enterprise architecture. Previously, he was responsible for setting architectural direction, conducting architecture assessments, and creating and executing architectural road maps in fields such as big data (Hadoop-based) solutions, service-oriented architecture (SOA), business process management (BPM), and enterprise application integration (EAI). Boris is the coauthor of Applied SOA: Service-Oriented Architecture and Design Strategies, Professional Hadoop Solutions, and Serving Machine Learning Models. He’s also cofounder of and frequent speaker at several Chicago user groups.

Website

Comments on this page are now closed.

Comments

Dean Wampler | HEAD OF DEVELOPER RELATIONS

03/03/2018 6:58am PST

Hi, Sonali. That should be fine. We will talk about architectural concerns in our presentation slides at different points during the tutorial, but spend most of our time discussing the code examples. You don’t have to edit or run anything if choose not to do so or don’t know how. You can observe as we discuss the code and the points we make about it.

If you decide not to attend, I have a session Thursday afternoon that discusses the same concepts.

Sonali Gupta | VP PRODUCT DEVELOPMENT

03/02/2018 1:25pm PST

Hi,

Is this session doable for someone who does not have hands-on Java / Scala programming proficiency but is more interested in understanding this at architecture level?

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com