Sep 23–26, 2019

Fast Data with the KISSS stack

Bas Geerdink (ING)
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 15/16
Secondary topics:  Data, Analytics, and AI Architecture, Streaming and IoT

Who is this presentation for?

Software/solution architects, people who are interested in fast data / streaming analytics

Level

Intermediate

Description

Streaming Analytics (or Fast Data processing) is becoming an increasingly popular subject in enterprise organizations. The reason for this is that customers want to have real-time experiences, such as notifications and advise based on their online behaviour and other users’ actions. A typical streaming analytics solution follows a ‘pipes and filters’ pattern that consists of three main steps: detecting patterns on raw event data (Complex Event Processing), evaluating the outcomes with the aid of business rules and machine learning algorithms, and deciding on the next action. At the core of this architecture is the execution of predictive models that operate on enourmous amounts of never-ending data streams.

In this talk, I’ll present an architecture for streaming analytics solutions that covers many use cases that follow this pattern: actionable insights, fraud detection, log parsing, traffic analysis, factory data, the IoT, and others. I’ll go through a few architecture challenges that will arise when dealing with streaming data, such as latency issues, event time vs server time, and exactly-once processing. The solution is build on the KISSS stack: Kafka, Impala, and Spark Structured Streaming. The solution is open source and available on GitHub.

Prerequisite knowledge

Basic knowledge of big data / fast data applications. Basic knowledge of application/solution architecture. Know what a reference architecture is and how to use one.

What you'll learn

Attendees wil learn how to set up a streaming analytics (fast data) solution, will learn some basic concepts in this field, and will learn about an open source technology stack that follows the patterns and principles of the reference architecture: Kafka, Impala, Spark Structured Streaming.
Photo of Bas Geerdink

Bas Geerdink

ING

Bas Geerdink is a programmer, scientist, and IT manager at ING, where he is responsible for the fast data systems that process and analyze streaming data. Bas has a background in software development, design, and architecture with broad technical experience from C++ to Prolog to Scala. His academic background is in artificial intelligence and informatics. Bas’s research on reference architectures for big data solutions was published at the IEEE conference ICITST 2013. He occasionally teaches programming courses and is a regular speaker at conferences and informal meetings.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts