Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

What can we learn from 750 billion GitHub events and 42 TB of code?

Felipe Hoffa (Google)
4:35pm5:15pm Thursday, September 28, 2017
Data Engineering & Architecture, Data-driven business management
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Anyone interested on measuring open source

Prerequisite knowledge

  • A working knowledge of SQL

What you'll learn

  • Discover how you can analyze the more than five years of GitHub metadata and 42+ terabytes of open source code


“Data gives us insights into how people build software, and the activities of open source communities on GitHub represent one of the richest datasets ever created of people working together at scale.”—GitHub Universe 2016

With Google BigQuery anyone can easily analyze the more than five years of GitHub metadata and 42+ terabytes of open source code. Felipe Hoffa explains how to leverage this data to understand the community and code related to any language or project. Relevant for open source creators, users, and choosers, this is data that you can leverage to make better choices.

Topics include:

  • How it’s run
  • How coding patterns have changed through time
  • Guiding your project design decisions based on actual usage of your APIs
  • How to request features based on data
  • The most effective phrasing to request changes
  • Effects of social media on a project’s popularity
  • Who starred your project and what other projects interest them
  • Measuring community health
  • Running static code analysis at scale
  • Tabs or spaces?
  • The evolution of the Apache projects on GitHub and their contributors
Photo of Felipe Hoffa

Felipe Hoffa


Felipe Hoffa is a developer advocate for big data at Google, where he inspires developers around the world to leverage the Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several videos, blog posts, and conferences around the world.

Comments on this page are now closed.


10/20/2017 10:23am EDT

Very interesting session. Could you upload the slides and video?
Thank you