60 hours of videos are uploaded to YouTube every minute. The Google search index contained 100 Million Gigabytes of data in 2010. Other Google services have hundreds of millions of users. Each of these products generates massive amounts of data. Google has developed custom technologies to analyze this data and make intelligent product decisions.
Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, Dremel allows users to run queries in a SQL-like language over tables with billions of rows in seconds. Dremel uses an architecture distinct from MapReduce-based platforms to improve efficiency when running multiple simultaneous query jobs. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google querying web logs, ad analytics and financial data.
Google’s situation is no longer unique. As more and more companies collect massive amounts of data, they need to quickly analyze it without large investments in infrastructure or human capital. We want everyone to have the power of Dremel.
BigQuery puts the powerful interactive querying capabilities of Dremel into the hands of users everywhere. It is designed for accessibility and ease of use, featuring a REST API as well as a web-based interface. BigQuery enables users to ingest 1 TB of data and run hundreds of queries on it with a SQL-like language in less than an hour.
This session will discuss the development and capabilities of Dremel, in particular its performance characteristics and ability to enable interactive ad-hoc querying on a multi-tenant architecture. We’ll also dive into the design challenges necessary to make the Dremel technology accessible and performant for third-party developers and business users to work with massive data sets.
Ryan is a Developer Advocate at Google, focused on cloud data services. He’s been at Google for 5 years and previously helped build out the Google Apps ISV ecosystem. He recently published his first book “Getting Started with OAuth 2.0” with O’Reilly.
Siddartha has been crunching large data sets at Google since 2005 for a variety of products and as a physics grad student before that. He has worked with or on almost every data processing framework at Google and is still looking for ways to make his job easier.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org or +1 (707) 827-7148
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.