The why and how of data lineage
Who is this presentation for?
- Software engineers, data engineers, data architects, and product managers
Every data team has to build an ecosystem that sustains the data, the users and the use of the data itself. This data ecosystem comes with its own challenges during the building phase, maintenance, and enhancement. One of the challenges is the need for a mechanism for data to be reliably monitored, associated with a purpose, and with the ability to be traced and retrieved. This is the foundation of the idea of data lineage.
Neelesh Salian dives into the importance of data lineage for an organization and how it can impact the data ecosystem. You’ll explore the why and the how behind having a mechanism for data lineage in your organization. The why includes understanding the exact need for lineage by examining the use cases it would power, while the how talks about the requirements and design that are needed to build such a mechanism. Neelesh also outlines which tools can be readily used versus building something on your own.
- A basic understanding of data infrastructure
What you'll learn
- Understand data lineage as a mechanism that needs to exist in an organization to enhance the use of its own data
Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires