Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Amazon for information: Building a modern data catalog

Aaron Kalb (Alation)
5:10pm–5:50pm Wednesday, 03/30/2016
Enterprise Adoption

Location: LL21 E/F
Average rating: ****.
(4.50, 4 ratings)

A data catalog provides context to help data analysts, data scientists, and other data consumers (including those with little technical background) find a relevant dataset, determine if it can be trusted, understand what it means, and utilize it to make better products and better decisions. Aaron Kalb explores how enterprises build interfaces that make sourcing data as easy as shopping on Amazon.

Aaron gives an overview of data catalogs and explains how they relate to concepts like data dictionaries or data inventories. He also covers some of the fastest and most effective ways to build a data catalog, discussing the roles crowds, experts, and machines play.

Topics include:

  • Descriptivism, consumability, traceability, and actionability
  • The roles of data producers, owners, stewards, and curators
  • How some of the enterprises with the world’s largest and most complex data environments are approaching the catalog challenge
  • How to help your organization move past a “card catalog” of data to an “Amazon catalog,” rich with all the information needed to fuel efficient, accurate insight-generation
Photo of Aaron Kalb

Aaron Kalb


Aaron Kalb has spent his career crafting and empowering delightful human-computer interactions, especially through natural language interfaces. Aaron currently leads the design team and guides the product vision at Alation, after leaving Stanford with a BS and an MS in symbolic systems and working at Apple on iOS and Siri (doing engineering, research, and design in the Advanced Development Group). In his spare time, he enjoys backpacking, board games, and Thai food.