The why and how of data lineage

Neelesh Salian (Stitch Fix)

5:25pm–6:05pm Wednesday, September 25, 2019

Location: 1A 03

Data Engineering and Architecture

Secondary topics: Data quality, data governance and data lineage, Retail and e-commerce

Average rating:

(4.00, 1 rating)

Who is this presentation for?

Software engineers, data engineers, data architects, and product managers

Level

Intermediate

Description

Every data team has to build an ecosystem that sustains the data, the users and the use of the data itself. This data ecosystem comes with its own challenges during the building phase, maintenance, and enhancement. One of the challenges is the need for a mechanism for data to be reliably monitored, associated with a purpose, and with the ability to be traced and retrieved. This is the foundation of the idea of data lineage.

Neelesh Salian dives into the importance of data lineage for an organization and how it can impact the data ecosystem. You’ll explore the why and the how behind having a mechanism for data lineage in your organization. The why includes understanding the exact need for lineage by examining the use cases it would power, while the how talks about the requirements and design that are needed to build such a mechanism. Neelesh also outlines which tools can be readily used versus building something on your own.

Prerequisite knowledge

A basic understanding of data infrastructure

What you'll learn

Understand data lineage as a mechanism that needs to exist in an organization to enhance the use of its own data

Neelesh Salian

Stitch Fix

Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.