Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Data science at Deutsche Telekom: Predicting global travel patterns and network demand

Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
3:50pm4:30pm Thursday, March 28, 2019
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Hadoop architects, Hadoop security engineers, and telco experts



Prerequisite knowledge

  • A basic understanding of the Cloudera Hadoop big data stack

What you'll learn

  • Learn how Deutsche Telekom fully secured its data (using Cloudera Hadoop) while still being able to use it to create interesting insights and results


Knowledge of customers’ location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management.

Gabor begins by discussing the motivation and business use case for the project. The Commercial Roaming Department deals with analyses of how the customers of its network, in Germany and nine other countries in Europe within the DTAG group, use other service providers’ networks and vice versa. These analyses are very important for negotiations with other service providers (roaming partners) in the world (Orange, Vodafone, O2, AT&T, Verizon, etc.). Every service provider must have a contract with each other about agreed price list of how much the service provider (as DT) will pay to its roaming partner (as Verizon) for its customers using the network in the foreign country. The roaming environment is rapidly changing, so it’s essential to have a clear picture on the customer and travel patterns and to have a better understanding on the drivers behind them.

Václav then covers the security aspect and architecture. You’ll learn why Deutsche Telekom decided to use Cloudera Hadoop to build a platform to support the necessary ad hoc and regular analytical activities. Because of very strict requirements from the Local Security and Data Privacy Department, the platform has to be a very secure environment. All customer data coming from the network must be anonymized and aggregated so it’s not possible to identify the exact location of a specific customer or the customer themself, but it must be still possible to use the data for the analyses and predictions.

A very important part of the implemented security concept is Sentry, which (with Kerberos and LDAP) is used to authenticate and authorize the user so they are able to see only the data they are allowed to. Anonymization, aggregation, and lookup methods are implemented in PySpark and the keys that are used for anonymization of the sensitive values (phone number, SIM card ID, phone ID, location, etc.) are stored in an HSM (hardware security module) outside of the Cloudera Hadoop cluster. Cleansed data is then stored in HDFS and Parquet in semiflat/JSON format so they are accessible via ad hoc SQL queries via Hive or Impala. Visualization is achieved with Solr and Hue. The dashboards are regularly updated and then shared with Deutsche Telekom’s upper management.

Photo of Vaclav Surovec

Vaclav Surovec

Deutsche Telekom

Václav Surovec is a senior big data engineer and comanages the Big Data Department at Deutsche Telekom IT. The department’s more than 45 engineers deliver big data projects to Germany, the Netherlands, and the Czech Republic. Recently, he led the commercial roaming project. Previously, he worked at T-Mobile Czech Republic while he was still a student of Czech Technical University in Prague.

Photo of Gabor Kotalik

Gabor Kotalik

Deutsche Telekom

Gabor Kotalik is a big data project lead at Deutsche Telekom, where he’s responsible for continuous improvement of customer analytics and machine learning solutions for the commercial roaming business. He has more than 10 years of experience in business intelligence and advanced analytics focusing on using insights and enabling data-driven business decisions.