Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Fast data at ING: Utilizing Kafka, Spark, Flink, and Cassandra for data science and streaming analytics

Bas Geerdink (ING)
12:0512:45 Thursday, 25 May 2017
Real-time applications, Spark & beyond
Location: Capital Suite 8/9
Level: Intermediate
Average rating: ***..
(3.25, 4 ratings)

Who is this presentation for?

  • Architects, managers, and developers

Prerequisite knowledge

  • Basic programming skills in at least one language (Java, Scala, C#, etc.)
  • Basic knowledge of big data tools, such as Hadoop and Spark

What you'll learn

  • Explore use cases for streaming analytics and see the way they are developed and implemented at ING

Description

As a data-driven enterprise, ING is heavily investing in big data, analytics, and stream processing. As in many other enterprises, ING deals with a large variety of data sources. Some are responsible for primary processes, while others are used to improve the quality of the service and keep internal operations going on smoothly. The amount of data that must be handled goes beyond the computing performance of single machines, and vertical scalability is hardly an option.

An important building block in ING’s analytics architecture is a state-of-the-art data lake, built with Hadoop and Spark. The data lake replaces several enterprise data warehouses and is the central repository for all types of data, supporting various types of queries for our stakeholders’ demands: batch, real-time, and both large and small datasets. Key elements of ING’s data lake include RESTfull APIs, secured and managed access to big data storage and processing, and real-time streaming analytics. Data is handled more often than not as streams, and ING works with Kafka and streaming computing (Spark, Flume, and Flink) to provide faster, more reactive, and up-to-date user experiences and journeys. In addition, machine learning (MLlib, H2O.ai, Python, and R) aids traditional SQL analytics to provide better insight when it comes to operational excellence, business processes, marketing, and security applications.

Bas Geerdink shares three use cases at ING that have a streaming data source at their core—the “look ahead” feature for predicting account balances, the actionable insights engine, and the fraud detection system—and discusses their respective architectures and technology. All software is currently in production, running with modern tools such as Kafka, Cassandra, Spark, Flink, and H2O.ai.

Photo of Bas Geerdink

Bas Geerdink

ING

Bas Geerdink is a programmer, scientist, and IT manager at ING, where he is responsible for the fast data systems that process and analyze streaming data. Bas has a background in software development, design, and architecture with broad technical experience from C++ to Prolog to Scala. His academic background is in artificial intelligence and informatics. Bas’s research on reference architectures for big data solutions was published at the IEEE conference ICITST 2013. He occasionally teaches programming courses and is a regular speaker at conferences and informal meetings.

Comments on this page are now closed.

Comments

Abdu Chadili |
5/05/2018 15:27 BST

Where can we replay the presentation?
Thanks

Jerzy Kott | DATA ARCHITECT
29/05/2017 17:45 BST

Any chance to get access to this presentation?