Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

From BI to big data; Or, There and back again

Francesco Mucio (Francescomuc.io)
16:3517:15 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 14
Average rating: ****.
(4.43, 7 ratings)

Who is this presentation for?

BI and Data Engineers, Architects, Leads and Managers of Data Teams

Level

Beginner

Prerequisite knowledge

Basic data modelling and big data concepts

What you'll learn

- Classify your data based on their value - New patterns that you need to know when moving to Big Data - Basics of SparkSQL - Basics of continuous applications

Description

There is always a point where a growing company has to accept that its infrastructure has to change to not hinder further growth. This was also the case of the BI infrastructure at Zalando. To be future proof we decided to embrace the cloud, the data lake, the big data. Not so fast.

Moving the Business Intelligence team from the coziness of RDBMS, ACID transactions, and years of experiences required a lot of effort, even the most motivated BI engineers can be lost once presented with the needed skills, tools and patterns.

First we had to learn new words to talk with our Big Data colleagues (or even google things), then we had to learn a new language to explain ourselves, finally we started building our data pipelines (or are just ETL processes?).

In this presentation Francesco and Alberto will show how to:

- Identify Bronze, Silver and Gold data and what these labels mean for a BI practitioner.
- Convert an SQL query to Spark syntax.
- Process streaming data with Structured Streaming and SparkSQL.
- Generate surrogate keys in a distributed world.
- And more.

These are problems we had to tackle early to give our engineers the confidence to step into Spark and the cloud. In very little time they naturally started using Scala beside SparkSQL.

Looking back at our journey, we noticed how much time we could have saved if we had recommendations or best practices for these problems. We are trying to share them here.

Photo of Francesco Mucio

Francesco Mucio

Francescomuc.io

Francesco Mucio is a BI architect at Zalando. The first time Francesco met the word data, it was just the plural of datum. Now he’s helping to redraw Zalando’s data architecture. He likes to draw data models and optimize queries. He spends his free time with his daughter, who, for some reason, speaks four languages.