Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Executive Briefing: Overview of data governance

Paco Nathan (
4:20pm5:00pm Wednesday, March 27, 2019
Average rating: ***..
(3.67, 6 ratings)



Effective data governance is foundational for AI adoption in enterprise, but it’s an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more—partly based on interviewing experts in this field about issues and best practices. Join in to learn what impact machine learning has on data governance and vice versa.

On the one hand, poor data governance can lead to system data quality issues, lack of data availability, and other risks, which means that the people within an organization cannot leverage data effective, thus limiting ROI. On the other hand, there’s a galaxy of compliance issues, aimed at preventing risks if people leverage data inappropriately. (While many are quick to mention GDPR, there are many standards in play, depending on the business vertical.) However, several other issues create drivers for data governance: how security concerns are reshaping the structure of web apps; ethics, bias, and others needs for ML transparency; the implications of “democratizing data and analytics,” both pro and con; the priority for reproducibility in analytics workflows; and unexpected ways in which open source is evolving rapidly in highly regulated environments.

Although many practices emerged from the era of data warehouses, big data changed the game and began drawing attention from regulators. Facebook, Twitter, and other tech giants now testify before US senators, who in turn struggle to grasp basic concepts in IT. Meanwhile the IT landscape is evolving rapidly: new forms of hardware and networking, serverless cloud offerings, and edge computing, for example, are redefining even the basic concepts related to data governance. Ultimately, risk management plays the “thin edge of the wedge” for these changes in enterprise, while the mantle of responsibility for data governance moves toward the emerging chief data officer role.

Topics include:

  • History, themes, and current drivers regarding data governance in industry
  • A survey of tools, vendors, process, standards, open source projects, etc.
  • Interviews with experts about issues and best practices
  • Security concerns, ethics and bias in ML, highly regulated environments, “democratizing data,” and workflow reproducibility
  • The impact machine learning has on data governance and vice versa
  • The role risk management plays as the “thin edge of the wedge” for these changes in enterprise
  • How the emerging chief data officer role fits in
Photo of Paco Nathan

Paco Nathan

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.

Comments on this page are now closed.


Picture of Paco Nathan
03/30/2019 3:34pm PDT

Slides for my talk are online at: