San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Understanding the data universe with a data catalog

John Haddad (Informatica)

2:40pm–3:20pm Wednesday, March 27, 2019

Executive Briefing and best practices, Strata Business Summit
Location: 2018

Secondary topics: Data preparation, data governance, and data lineage

Average rating:

(4.60, 5 ratings)

Who is this presentation for?

Managers of data science, analytics, and data governance teams

Level

Non-technical

What you'll learn

Learn how to use a data catalog for analytic and data governance projects

Description

Before tackling any project, it’s always prudent to first take inventory of what’s available. This helps you plan and execute a project quickly and efficiently. It’s now common knowledge that data scientists or analysts spend 80% of their time looking for the data they need for an analytics project. Imagine a data analyst at a life sciences or healthcare company working to build an analytic model to improve patient outcomes. There are thousands of possible datasets across the enterprise ranging from data related to patient clinical and electronic medical records (EMR) to genomics, claims, billing, patient forums, call detail records, HL7 data, and much more. Where do you even begin?

John Haddad explains how a data catalog can help you find the data you need and trust for analytic and data governance projects. A data catalog that uses AI/ML can help data scientists and analysts find and recommend the data they need and facilitates collaboration among the analytics teams helping curate the data so it improves in quality and value over time. Just like a powerful space telescope that scans the universe, a data catalog scans and collects metadata from enterprise systems including many types of databases, applications, and tools. It then automatically builds out a metadata and relationship graph exposed via REST APIs so end users and developers can query metadata for other applications or integrations.

A data catalog provides very detailed lineage down to the attribute and column level so that analysts can explore the provenance of data to see if it can be trusted. Using AI/ML, a data catalog discovers and classifies data, providing users with a very intuitive search experience (even recognizing synonyms). You can search on business keywords and filter on out-of-the-box or custom facets to find just the data you’re looking for.

John Haddad

Informatica

John Haddad is vice president at Informatica, where he runs product and technical marketing for the Big Data, Enterprise Data Catalog and Cloud/Hybrid data management product portfolios. He has over 25 years’ experience developing and marketing enterprise software, focusing on enterprise cloud data management over the last 10 years. Previously, John held various positions in product marketing, R&D, and management at Oracle and Right Hemisphere (acquired by SAP). John holds an AB in applied mathematics from UC Berkeley.

Website

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com