Google Cloud for Data Crunchers

Beyond Hadoop, Data Science Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: **...
(2.57, 7 ratings)


Please download the following software in advance of this tutorial (BEFORE you arrive on site):

  • Compute Engine’s gcutil –
  • Cloud Storage’s gsutil –
  • BigQuery bq –
  • App Engine SDK –
  • Python 2.6+ (if not already installed) –

All users must have a Gmail account.*

When data volume and velocity become massive, processing and analysis solutions require specialized technologies for different parts of the data pipeline. Google’s Cloud Platform is designed to help you focus on building applications, not infrastructure. We’ll demonstrate how to build end to end Big Data applications – from data collection, to analysis, to reporting and visualization.

The agenda will include:

  • Using App Engine to collect data from users or remote sensors
  • Using App Engine and Compute Engine to crunch and transform both structured and unstructured data
  • Will demonstrate how to quickly use MapReduce
  • Using BigQuery for exploring large data sets with ad hoc queries
  • Loading data in from App Engine or Compute Engine
  • Running SQL-like queries on the data
  • Visualizing and reporting results of analysis
  • Using custom code and the Visualization APIs
  • Using Google Spreadsheets

Programming experience required. We’ll be using Python for this tutorial, but prior Python experience is not required.

Photo of Ryan Boyd

Ryan Boyd


San Francisco-based software engineer, authNZ geek, data geek, and graph geek Ryan Boyd is director of developer relations for Neo4j, an open source graph database that powers connected data analysis in data journalism, cancer resource, and some of the world’s top companies. Previously, he was head of developer relations for Google Cloud Platform and worked on over 20+ different APIs and developer products during his eight years at Google. Ryan is the author of Getting Started with OAuth 2.0 by O’Reilly. Now that he has a young daughter, he no longer skydives but still enjoys the adventures of sailing and cycling.

Photo of Michael Manoochehri

Michael Manoochehri

Google, Inc.

Michael is a Developer Programs Engineer supporting developers who work with Google Cloud and Apps products. With many years of experience working on Internet media projects for non-profit organizations, he especially enjoys helping educational institutions “Go Google.” Michael has written for tech blog, has spent time in rural Uganda researching mobile phone use, and has a Masters degree in Information Management and Systems from UC Berkeley’s School of Information.

Photo of Julia Ferraioli

Julia Ferraioli


Julia Ferraioli is a Senior Developer Advocate with Google’s Open Source Programs Office. She’s a polyglot, though in code only, and is excited about open source sustainability, accessibility, machine learning, containers, and sprinkles (in roughly that order). Her superpowers are finding ways to incorporate her interests into her work and estimating how much stuff can fit inside a container.

Comments on this page are now closed.


Picture of Julia Ferraioli
Julia Ferraioli
02/27/2013 4:32am PST

Here’s the form I mentioned yesterday at the tutorial to request more access to GCE:

Thanks everyone for coming!

Picture of Julia Ferraioli
Julia Ferraioli
02/25/2013 10:52pm PST

John, actually that’s for the Google Compute Engine project that we’ll be setting you up with for the session. Skip that step for now!

John Faughnan
02/25/2013 10:48pm PST

The gcutil install page directions refer to: gcutil auth—project=

What it omits is that is the name of the Google App Engine you create via browser. So for example

gcutil auth—project=jfProjectName

That works.

Picture of Julia Ferraioli
Julia Ferraioli
02/25/2013 3:11pm PST

Nice, Kevin!

Kevin Michel
02/25/2013 3:03pm PST

If you’re on OSX and lazy, here’s a script to fetch and install all deps :

Picture of Julia Ferraioli
Julia Ferraioli
02/23/2013 5:04am PST

Hi Stephen, great question! We’re going to be giving attendees access to Google Compute Engine for the duration of the tutorial. To set up gcutil, simply download the tar or zip file from our developer site:

Then follow the instructions listed to extract and update your path. We’ll show you how to configure it to access Google Compute Engine during the tutorial.


Stephen Herskovits
02/22/2013 9:37am PST

Hi, The first requirement of setting up the gcutil is to apply for access to the cloud compute service. I have done so but have not received any confirmation. Is there a code that attendees should use? Thanks


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts