Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

The pitfalls of running a self-service big data platform

Sander Kieft (Sanoma Media)
2:05pm2:45pm Thursday, September 28, 2017
Data-driven business management, Strata Business Summit
Location: 1E 10/11 Level: Non-technical
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • CTOs, CIOs, and product managers

Prerequisite knowledge

  • A basic understanding of Hadoop, business analytics, and ETL tooling

What you'll learn

  • Explore Sanoma's experience running big data as a self-service platform


Sanoma has been running big data as a self-service platform for over five years, mainly as a service for business analysts to work directly on the source data. The road to getting business analysts to directly do their analyses on Hadoop was far from smooth. Sander Kieft explores Sanoma’s journey and shares some lessons learned along the way.

When Sanoma started with Hadoop, the company’s knowledge was limited. To kickstart the project, the decision was made to work together with a business intelligence company. But because of the consultant’s own limited experience and the lack of Hadoop support in the company’s ETL tool, the process was painfully slow. In the end, a new setup had to be created, which included switching to a real Hadoop distribution.

Not everyone is accustomed to accessing data by programming in Java or writing SQL queries, so to really make the data worthwhile, Sanoma introduced Hue. To get business analysts up to speed, the company had to design a training program, consisting of two full-days covering SQL basics and Hive specifics as well as an introduction to the dashboard.

The original project really gained momentum thanks to redundant hardware from a virtualization project, allowing rapid growth against very limited investments. Of course this came with a price: running so much end-of-support and -life hardware is only possible in a colocation environment. Last year the decision was made to move to the cloud. On paper, it was an easy switch, but the reality was slightly more complicated.

Photo of Sander Kieft

Sander Kieft

Sanoma Media

Sander Kieft is the ICT architect at Sanoma Media, where he is responsible for the common services and performance-based titles within Sanoma. His team designs and builds (web) services for some of the largest websites and most popular mobile applications in the Netherlands, Belgium, and Finland. Sander has been working with large-scale data in media for 15 years and with Hadoop and big data platforms in production for nearly a decade. Previously, he was a developer, architect, and technology manager for some of the largest websites in the Netherlands.