ING is a data-driven enterprise that is heavily investing in big data, analytics, and streaming processing. As in many other enterprises, ING deals with a large variety of data sources. Some are responsible for primary processes while others are used to improve the quality of the service and to keep internal operations going on smoothly. The amount of data which must be handled goes beyond the computing performance of single machines, and vertical scalability is hardly an option.
An important building block in ING’s analytics journey is a state-of-the-art data lake, built with Hadoop and Spark. The data lake replaces several enterprise data warehouses and is the central repository for all types of data, supporting various types of queries for its stakeholders’ demands: batch, real-time, large, and small datasets. Key elements of ING’s data lake are RESTful APIs, secured and managed access to big data storage and processing, and real-time streaming analytics. Data is being handled more often than not as streams, and ING is experimenting with Kafka and streaming computing (Spark, Flume, Flink) to provide faster, more reactive, and up-to-date user experiences and journeys. In addition, machine learning (MLlib) is aiding traditional SQL analytics to provide better insight when it comes to operational excellence, business processes, marketing, and security applications.
ING wants to help customers in their financial planning by providing useful insights and small pieces of advice. These insights should be based on an up-to-data customer profile and should be actionable (e.g., “We predict that your balance is dropping below zero. Do you want to transfer some money from your saving account? [Yes/No/Later]”). Bas Geerdink offers an overview of ING’s streaming analytics solution for providing actionable insights to customers—built with a combination of open source technologies, including Kafka, Flink, and Cassandra—sharing lessons learned, best practices, architecture designs, and code. The first version of this solution is currently in production in the Netherlands for a limited set of users, and as it is further developed, it will gradually be rolled out to millions of customers worldwide.
Bas Geerdink is an independent technology lead, focusing on AI and big data. He has worked in several industries on state-of-the-art data platforms and streaming analytics solutions, in the cloud and on prem. Bas has a background in software development, design, and architecture with broad technical experience from C++ to Prolog to Scala. His academic background is in artificial intelligence and informatics. Bas’s research on reference architectures for big data solutions was published at the IEEE conference ICITST 2013. He occasionally teaches programming courses and is a regular speaker at conferences and informal meetings.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.