Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

The Snowflake data warehouse: How Sharethrough analyzes petabytes of event data in a SQL database (sponsored by Snowflake)

Dave Abercrombie (Sharethrough)
2:40pm3:20pm Wednesday, March 7, 2018
Location: 230 B
Average rating: ***..
(3.50, 2 ratings)

What you'll learn

  • Learn how Sharethrough used Snowflake to build an analytic and reporting platform that handles petabyte-scale data with ease


Snowflake is an analytic SQL database designed for the cloud. Its flexible autoscaling provides economy and simplifies capacity planning. Snowflake lets you have have multiple compute clusters that share data but are completely independent, allowing them to be optimized for vastly different workloads, but it feels like a traditional ANSI SQL database, with features such as atomic transactions, deletes, and SQL roles and privileges. Database users will feel right at home and are freed from doing system-level database maintenance tasks.

Dave Abercrombie explains how Sharethrough used Snowflake to build an analytic and reporting platform that handles petabyte-scale data with ease and demonstrates how to ingest terabytes of event data (using JSON in S3), combine it with application look-up dimensional data, and drive business and operational decisions. Dave also discusses how old-school SQL roles and privileges can be used to support safe, transparent, low-friction collaboration between internal teams and how his self-healing ETL system, based on Apache Airflow, ensures over a petabyte of data with near-perfect referential integrity. Dave concludes by explaining how to optimize multiple compute clusters to separate constant ETL data ingestion from interactive analytic users, with each cluster optimized for these very different use cases.

This session is sponsored by Snowflake Technologies.

Dave Abercrombie


Dave Abercrombie is a senior staff engineer at Sharethrough, where his database ingests dozens of terabytes of semistructured data daily to maintain a petabyte database with near perfect referential integrity. Dave approaches BI from a database perspective. His specialities are data integrity, ETL, robust dimensional design, and both logical and physical database design. He has two decades experience in database engineering, with the last six years focused on business intelligence on very large databases.