Knowledge of customers’ location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management.
Gabor begins by discussing the motivation and business use case for the project. The Commercial Roaming Department deals with analyses of how the customers of its network, in Germany and nine other countries in Europe within the DTAG group, use other service providers’ networks and vice versa. These analyses are very important for negotiations with other service providers (roaming partners) in the world (Orange, Vodafone, O2, AT&T, Verizon, etc.). Every service provider must have a contract with each other about agreed price list of how much the service provider (as DT) will pay to its roaming partner (as Verizon) for its customers using the network in the foreign country. The roaming environment is rapidly changing, so it’s essential to have a clear picture on the customer and travel patterns and to have a better understanding on the drivers behind them.
Václav then covers the security aspect and architecture. You’ll learn why Deutsche Telekom decided to use Cloudera Hadoop to build a platform to support the necessary ad hoc and regular analytical activities. Because of very strict requirements from the Local Security and Data Privacy Department, the platform has to be a very secure environment. All customer data coming from the network must be anonymized and aggregated so it’s not possible to identify the exact location of a specific customer or the customer themself, but it must be still possible to use the data for the analyses and predictions.
A very important part of the implemented security concept is Sentry, which (with Kerberos and LDAP) is used to authenticate and authorize the user so they are able to see only the data they are allowed to. Anonymization, aggregation, and lookup methods are implemented in PySpark and the keys that are used for anonymization of the sensitive values (phone number, SIM card ID, phone ID, location, etc.) are stored in an HSM (hardware security module) outside of the Cloudera Hadoop cluster. Cleansed data is then stored in HDFS and Parquet in semiflat/JSON format so they are accessible via ad hoc SQL queries via Hive or Impala. Visualization is achieved with Solr and Hue. The dashboards are regularly updated and then shared with Deutsche Telekom’s upper management.
Václav Surovec is a senior big data engineer and comanages the Big Data Department at Deutsche Telekom IT. The department’s more than 45 engineers deliver big data projects to Germany, the Netherlands, and the Czech Republic. Recently, he led the commercial roaming project. Previously, he worked at T-Mobile Czech Republic while he was still a student of Czech Technical University in Prague.
Gabor Kotalik is a big data project lead at Deutsche Telekom, where he’s responsible for continuous improvement of customer analytics and machine learning solutions for the commercial roaming business. He has more than 10 years of experience in business intelligence and advanced analytics focusing on using insights and enabling data-driven business decisions.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org