There is always a point where a growing company has to accept that its infrastructure has to change to not hinder further growth. This was also the case of the BI infrastructure at Zalando. To be future proof we decided to embrace the cloud, the data lake, the big data. Not so fast.
Moving the Business Intelligence team from the coziness of RDBMS, ACID transactions, and years of experiences required a lot of effort, even the most motivated BI engineers can be lost once presented with the needed skills, tools and patterns.
First we had to learn new words to talk with our Big Data colleagues (or even google things), then we had to learn a new language to explain ourselves, finally we started building our data pipelines (or are just ETL processes?).
In this presentation Francesco and Alberto will show how to:
- Identify Bronze, Silver and Gold data and what these labels mean for a BI practitioner.
- Convert an SQL query to Spark syntax.
- Process streaming data with Structured Streaming and SparkSQL.
- Generate surrogate keys in a distributed world.
- And more.
These are problems we had to tackle early to give our engineers the confidence to step into Spark and the cloud. In very little time they naturally started using Scala beside SparkSQL.
Looking back at our journey, we noticed how much time we could have saved if we had recommendations or best practices for these problems. We are trying to share them here.
Francesco Mucio is a BI architect at Zalando. The first time Francesco met the word data, it was just the plural of datum. Now he’s helping to redraw Zalando’s data architecture. He likes to draw data models and optimize queries. He spends his free time with his daughter, who, for some reason, speaks four languages.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com