Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

The Data Intelligence Hub: On-demand Hadoop resource provisioning in Europe’s Industrial Data Space using Cloudera Altus

Sven Loeffler (Deutsche Telekom)
14:0514:45 Thursday, 24 May 2018
Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 2/3 Level: Intermediate
Secondary topics:  Telecom
Average rating: **...
(2.00, 1 rating)

Who is this presentation for?

  • Innovation managers, data engineers, data analysts, and those working in business development

Prerequisite knowledge

  • A basic understanding of the big data ecosystem, particularly with regard to data-driven products and services

What you'll learn

  • Explore the Data Intelligence Hub, a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things

Description

Sven Löffler offers an overview of the Data Intelligence Hub, T-Systems’s implementation of the Fraunhofer Industrial Data Space: a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things. The Data Intelligence Hub enables the ingesting, processing, and sharing of vast amounts of industrial IoT data among up- and downstream supply chain partners. Data can be ingested in a streaming or in a batch fashion and is used for generating analytical insights that power cases of predictive maintenance, smart device monitoring, supply chain optimization, etc.

Data is ingested into the Data Intelligence Hub using a special Apache Kafka-based connector that is capable of scaling to large amounts of streaming data. Ingested data is persisted into an object storage from where it can be fed into automated processing pipelines. Since the Data Intelligence Hub is targeting a large number of industrial customers, it is vital to provide a cost-effective means of provisioning compute resources while avoiding overprovisioning of compute capacity.

The backbone of the Data Intelligence Hub processing engine is based on Apache Spark. This way, customers can author their data transformation pipelines (e.g., ETL, data cleansing, machine learning) as Apache Spark jobs and submit them to run on their ingested IoT data. Upon job submission, the Data Intelligence Hub will provision an on-demand Cloudera cluster using Cloudera Altus. The cluster only exists as long as the data transformation job is running and is released right after the job has completed. This elasticity allows for Data Intelligence Hub customers to use the full power of a Cloudera CDH stack while only paying for compute resources when they are really needed.

Sven details the architecture behind the Data Intelligence Hub, outlines how Cloudera Altus empowers the on-demand compute resource provisioning necessity of Data Intelligence Hub’s processing engine, and shares a live demo that shows how an Altus job provisions an on-demand Cloudera cluster in order to extract a predictive machine learning model from a large industrial dataset. Along the way, Sven explores the Industrial Data Space, Europe’s largest community for the exchange of industrial IoT data.

Photo of Sven Loeffler

Sven Loeffler

Deutsche Telekom

Sven Löffler is a business development executive for big data analytics at T-Systems, where he is responsible for identification and development of data analytics and data-driven solutions. Previously, he was an executive IT specialist (OpenGroup certified) and business development leader for IBM Watson and big data solutions. Over his 20-year career, he has held a number of sales support positions in Germany and Europe and has proven and extensive experience in the business intelligence, performance management, big data marketing and support services, technical sales, and marketing spaces.