Google Search was really designed to look at the web’s linked structure: pages that link to other pages. We are great at analyzing this structure, and have even moved to adapt this to other kinds of structures, such as social networking.
One thing that Google is not good at, though, is finding and then reasoning about structured data.
With few links to the outside world and relatively fewer sites linking to it, even finding these files is hard, as they may be buried deep within a website.
Tabular data tends to be brittle and highly structured, which, while you would think this would make it easy to understand and work with, actually makes it harder. Given two datasets with wildly different schema that cover the same topic, how do you decide which one is better? Without the semantic information of the web to guide us, we have to adapt entirely new algorithms and understanding to attempt to categorize and understand how this data is.
And then, finally, the feedback problem: the web is its own feedback structure, since links produce a highly connected structure that gives you a lot of information about what pages are pointed to, and from whom. With structured data, we have no such linking structure. So, how then, to even use the information that this table is good to find more, better tables?
Government data is a uniquely interesting and important subset of structured data out there, often extremely rich and well-curated, but also often massive and hard-to-find. The goal of the Google government public data search team is to help the world discover and use government, structured data.
Christopher is a Simple software engineer. He has previously worked in the Open Source Project Office at Google, and for the US Government. He holds a Ph.D in computer science from The University of Tulsa.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or firstname.lastname@example.org.
View a complete list of OSCON contacts