Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Building a streaming analytics solution to provide real-time actionable insights to customers

Bas Geerdink (Aizonic)
4:00pm4:30pm Tuesday, March 14, 2017
DCS, Strata Business Summit
Location: LL20 A Level: Beginner
Average rating: *****
(5.00, 2 ratings)

ING is a data-driven enterprise that is heavily investing in big data, analytics, and streaming processing. As in many other enterprises, ING deals with a large variety of data sources. Some are responsible for primary processes while others are used to improve the quality of the service and to keep internal operations going on smoothly. The amount of data which must be handled goes beyond the computing performance of single machines, and vertical scalability is hardly an option.

An important building block in ING’s analytics journey is a state-of-the-art data lake, built with Hadoop and Spark. The data lake replaces several enterprise data warehouses and is the central repository for all types of data, supporting various types of queries for its stakeholders’ demands: batch, real-time, large, and small datasets. Key elements of ING’s data lake are RESTful APIs, secured and managed access to big data storage and processing, and real-time streaming analytics. Data is being handled more often than not as streams, and ING is experimenting with Kafka and streaming computing (Spark, Flume, Flink) to provide faster, more reactive, and up-to-date user experiences and journeys. In addition, machine learning (MLlib) is aiding traditional SQL analytics to provide better insight when it comes to operational excellence, business processes, marketing, and security applications.

ING wants to help customers in their financial planning by providing useful insights and small pieces of advice. These insights should be based on an up-to-data customer profile and should be actionable (e.g., “We predict that your balance is dropping below zero. Do you want to transfer some money from your saving account? [Yes/No/Later]”). Bas Geerdink offers an overview of ING’s streaming analytics solution for providing actionable insights to customers—built with a combination of open source technologies, including Kafka, Flink, and Cassandra—sharing lessons learned, best practices, architecture designs, and code. The first version of this solution is currently in production in the Netherlands for a limited set of users, and as it is further developed, it will gradually be rolled out to millions of customers worldwide.

Photo of Bas Geerdink

Bas Geerdink


Bas Geerdink is an independent technology lead, focusing on AI and big data. He has worked in several industries on state-of-the-art data platforms and streaming analytics solutions, in the cloud and on prem. Bas has a background in software development, design, and architecture with broad technical experience from C++ to Prolog to Scala. His academic background is in artificial intelligence and informatics. Bas’s research on reference architectures for big data solutions was published at the IEEE conference ICITST 2013. He occasionally teaches programming courses and is a regular speaker at conferences and informal meetings.