• Microsoft
  • Nebula
  • Google
  • SugarCRM
  • Facebook
  • HP
  • Intel
  • Rackspace Hosting
  • WSO2
  • Alfresco
  • BlackBerry
  • Dell
  • eBay
  • Heroku
  • InfiniteGraph
  • JBoss
  • LeaseWeb
  • Liferay
  • Media Temple, Inc.
  • OpenShift
  • Oracle
  • Percona
  • Puppet Labs
  • Qualcomm Innovation Center, Inc.
  • Rentrak
  • Silicon Mechanics
  • SoftLayer Technologies, Inc.
  • SourceGear
  • Urban Airship
  • Vertica
  • VMware
  • (mt) Media Temple, Inc.

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of OSCON contacts

How to Kill a Patent with Python

Open Data
Location: F150
Tags: patents, nlp, graphs
Average rating: ****.
(4.00, 2 ratings)

When faced with a patent case, it is essential to find “prior art” – patents and publications that describe a technology before a certain date. The problem is that the indexing mechanisms for patents and publications are not as good as they could be, making good prior art searching more of an art than a science. We can apply some of our natural language processing and “big data” techniques to the US patent database, getting us better results more quickly.

  • Part I: The USPTO as a data source. The full-text of each patent is available from the USPTO (and now from Google.) What does this data look like? How can it be harvested and normalized to create data structures that we can work with?
  • Part II: Once the patents have been cleaned and normalized, they can be turned into data structures that we can use to evaluate their relationship to other documents. This is done in two ways – by modeling each patent as a document vector and a graph node.
  • Part IIA: Patents as document vectors. Once we have a patent as a data structure, we can treat the patent as a vector in an n-dimensional space. In moving from a document into a vector space, we will touch on normalization, stemming, TF/IDF, Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA).
  • Part IIB: Patents as technology graphs. This will show building graph structures using the connections between patents – both the built-in connections in the patents themselves as well as the connections discovered while working with the patents as vectors. We apply some social network analysis to partition the patent graph and find other documents in the same technology space.
  • Part III: What have we built? Now that we have done all this analysis, we can see some interesting things about the patent database as a whole. How does the patent database act as a map to the world of technology? And how has this helped with the original problem – finding better prior art?
Photo of Van Lindberg

Van Lindberg


Van Lindberg has worked professionally as an engineer, as a lawyer, and as an executive. He currently has a dual legal/technical role at Rackspace, and has worked out of both the legal department and the Office of the CTO. In April 2012, the American Bar Association Journal named Van as one of “America’s Top 12 Techiest Attorneys.”

On the legal side, Van leads Rackspace’s Intellectual Property program, directing Rackspace’s strategy and policy around patent, copyright, trademark, trade secret, and open source matters. Van also heads Rackspace’s lobbying efforts relative to patent reform.

On the technical side, Van runs Rackspace’s technical leadership corps, known internally as the “TCT.” Van also works in technical strategy and ecosystem engagement at Rackspace, identifying emerging technologies, separating out differentiating versus non-differentiating product elements, and using open source strategies to be more competitive.

Previously, Van worked at the law firm of Haynes and Boone, where he wrote “Intellectual Property and Open Source,” published by O’Reilly and Associates, and grew an open source practice helping businesses with everything from open source compliance to business strategy.

In addition to Van’s open source practice, he did IP transactional work, patent prosecution, litigation, and post-grant actions (ex parte and inter partes reexams/reviews).

Van currently serves as chairman of the board of the Python Software Foundation, on the board of the OpenStack Foundation, and as the chair of the Docker Governance Advisory Board.

Comments on this page are now closed.


Bryan Davis
07/28/2011 4:27pm PDT

Van was able to touch on some great material, but unfortunately he had about 2 hours worth of topics compressed into his 40 minute time slot.