The Why and How of Data Lineage
Who is this presentation for?Software Engineers, Data Engineers, Data Architects, Product Managers
Prerequisite knowledgeUnderstanding of data infrastructure
What you'll learn
Every data team has to build an ecosystem that sustains the data, the users and the use of the data itself. This data ecosystem comes with its own challenges during the building phase as well as during its maintenance and enhancement. One of the challenges is the need for a mechanism for data to be reliably monitored, associated with a purpose and having the ability to be traced and retrieved. This is the foundation of the idea of Data Lineage.
It is important to understand the role of data lineage in an organization and how it can impact the data ecosystem. This talk focuses on the why and the how behind having a mechanism for data lineage in your organization. The Why includes understanding the exact need for lineage by examining the use cases it would power. While the How talks about the requirements and design that are needed to build such a mechanism.
After covering the philosophy behind building data lineage, there will also be a discussion of what tools can be readily used versus the idea of building something on your own.
Neelesh Srinivas Salian is a Software Engineer on the Data Platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts