Sep 23–26, 2019
Please log in

The why and how of data lineage

Neelesh Salian (Stitch Fix)
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 03
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Software engineers, data engineers, data architects, and product managers




Every data team has to build an ecosystem that sustains the data, the users and the use of the data itself. This data ecosystem comes with its own challenges during the building phase, maintenance, and enhancement. One of the challenges is the need for a mechanism for data to be reliably monitored, associated with a purpose, and with the ability to be traced and retrieved. This is the foundation of the idea of data lineage.

Neelesh Salian dives into the importance of data lineage for an organization and how it can impact the data ecosystem. You’ll explore the why and the how behind having a mechanism for data lineage in your organization. The why includes understanding the exact need for lineage by examining the use cases it would power, while the how talks about the requirements and design that are needed to build such a mechanism. Neelesh also outlines which tools can be readily used versus building something on your own.

Prerequisite knowledge

  • A basic understanding of data infrastructure

What you'll learn

  • Understand data lineage as a mechanism that needs to exist in an organization to enhance the use of its own data
Photo of Neelesh Salian

Neelesh Salian

Stitch Fix

Neelesh Srinivas Salian is a software engineer on the data platform team at Stitch Fix, where he works on the compute infrastructure used by the company’s data scientists. Previously, he worked at Cloudera, where he worked with Apache projects like YARN, Spark, and Kafka.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires