4–7 Nov 2019
Please log in

Evolution of a modern cloud-based data lake

15:0015:45 Thursday, 7 November 2019
Location: M8
Secondary topics:  Best Practice, Case Study
Average rating: **...
(2.50, 2 ratings)

Who is this presentation for?

  • Data engineers, data architects, and product specialists in data

Level

Intermediate

Description

Viacheslav Inozemtsev outlines the experience of building and evolving the cloud-based data lake of a company as large as Zalando. In particular, he addresses the three main areas of ingestion of data from all the various sources in the company, easy and convenient access to data, and security and governance at the scale of more than 100 teams. He also explores the issue of cost throughout all three parts.

The first challenge—ingestion of data—is a broad topic on its own. Viacheslav examines the evolution of Zalando’s ingestion pipelines from different company-wide data sources, such as messaging bus, data warehouse, Google Analytics platform, as well as custom datasets on demand. For the second challenge—access to data—you’ll learn the evolution of the tools and principles Zalando developed to give the rest of the company convenient means to consume data and extract information from it. The largest challenge is security and governance, although it doesn’t bring any value directly. Viacheslav explores how the company addressed security and access management in the first place and how it evolved them later when better frameworks and services appeared on the market.

Prerequisite knowledge

  • A basic understanding of data engineering and cloud technologies

What you'll learn

  • Learn best practices in building and evolving a data lake, the benefits of using cloud for building a data lake, and key factors of success while building a data lake
Photo of Viacheslav Inozemtsev

Viacheslav Inozemtsev

Zalando

Viacheslav Inozemtsev is a data engineer at Zalando, building an internal data lake platform on top of Apache Spark, Delta Lake, Apache Presto, and serverless cloud technologies, and enabling machine learning and AI for all teams and departments of the company. He has eight years of data and software engineering experience. He earned a degree in applied mathematics, and then an MSc degree in computer science with the focus on data processing and analysis.

  • AXA
  • Contentful
  • Datadog
  • HERE Technologies
  • QAware
  • SIG
  • Zara Tech
  • GitLab
  • NearForm
  • WhiteSource
  • Cloud Native Computing Foundation

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires