Gov 2.0 Events :  Gov 2.0 Expo  •  Gov 2.0 Summit

Opening Government Funded Knowledge Creation: Increasing Transparency and Scientific Integrity

Open Data and Web Services
Location: Room 202 A
Average rating: *****
(5.00, 1 rating)

Open access to our body of federally funded research, including not only published papers but also any supporting data and code, is important not just for public knowledge sharing but for the integrity of the research itself. The web has changed how scientific research is communicated – publicly, rather than in print journals behind a paywall – and policy makers have noticed. The Whitehouse’s Office of Science and Technology Policy is currently conducting a request for comment on issues of transparency in federally funded scientific research (see ) and The Federal Research Public Access Act (S.R. 1373) was introduced in the Senate last summer. My theme is how the goal of availability of scientific research on the web not only allows public access to tax-payer funded knowledge, but also how such transparency is essential for the verifiability and integrity of the science itself. The topic involves a delicate interplay between regulatory environments, funding agency policy, technology, and principles of openness and knowledge transfer.

The talk would begin with an introduction to the current open access to scientific research discussion, and how this is a component of the larger conversation regarding transparency in government. I’ll describe how both Congress and the Whitehouse are including access to science in their policy crafting, and where things stand in terms of legislation and executive directives. The stakes are high: if scientific knowledge is not made open, not only does this impact the public’s ability to access and participate in the development of scientific results, but it hinders effective government as policy decisions are becoming increasingly based on scientific research. Transparency in government becomes synonymous with transparency in science in those cases. Examples are regulation based on climate research (Climategate for example should be interpreted as a failure of information sharing) and the OSTP’s recent push toward evidence-based policy. The remainder of the talk focuses on barriers to scientific sharing and emergent Policy and Technological responses to these barriers.

An important barrier to public access is conflicting Intellectual Property rights. I outline the legal landscape and focus on the problem copyright imposes of sharing. I plan to present a solution I proposed in 2009 called the “Reproducible Research Standard.” This standard takes a page from both the Open Source Software movement and Creative Commons’ efforts in facilitating the sharing of artistic works and proposes a series of open licenses for scientific research output – the data, the code, and the research paper. Work on the implementation of this standard continues with Creative Commons, Science Commons, and through my involvement in government advising. Code patents and ownership of research output will be discussed, including the university’s role, and the impact on tech transfer to industry.

Accessible scientific research provides an opportunity to develop open standards and platforms for sharing of science. I plan to show how open licensing does this, and give examples of research projects that are housed on the web as a platform for innovation (see for example). Science is democratizing through user-led innovation in this area. I will point briefly to examples such as Galaxy Zoo, where hobbyists are making real scientific contributions through the careful design of generative web-based sharing platforms and social collaboration.

Challenges exist in the design of collaborative platforms, and in standards for data and code sharing. This comprises the second major barrier to sharing of scientific information. I plan to discuss current efforts, such as GitHub’s outreach to academics, and ongoing work in public repository creation such as PubMed Central. A rapidly evolving area of technological development is in providing semantic structure for sharing scientific work, which seems especially facilitated through the use of RDFa+HTML tagging vocabularies (I will present my current research in this area).

The final point I plan to make is to draw a parallel between the gains to openness in code and openness in science. That more eyes make all bugs shallow applies in scientific research: with more people checking code, data, and results, we are more likely to find errors in our scientific understanding of the world. We are also more likely the generate knowledge that is routinely verifiable and reliable. A very serious consequence of the computer revolution in scientific research is that communication hasn’t caught up: data and code are typically not shared upon publication of the research paper. Standards for openness in scientific research will help avert a growing credibility gap in computational science – if the methods (the data and code) remain hidden modern computational results remain unverified. I will close with a discussion of the dual benefits of transparency in scientific research: an increase in the quality of our scientific research through the reproducibility of results and the spread of scientific knowledge beyond the ivory tower.

Victoria Stodden

Yale Law School

Victoria is a Postdoctoral Associate in Law and a Kauffman Fellow in Law at the Information Society Project at Yale Law School. After completing her PhD in statistics, she obtained a Master’s in Legal Studies in 2007 from Stanford Law School where she created a new licensing structure for computational research. Her paper proposing this Intellectual Property framework, called the “Reproducible Research Standard,” won the Kaltura Writing Competition, given in connection with the Third Conference on Access to Knowledge (A2K3) in 2008. She completed her PhD in statistics at Stanford University in 2006 with advisor David Donoho. A component of her dissertation was the development and release of SparseLab, a collaborative platform for distributing code and data underlying published papers that focus on sparse solutions to underdetermined systems of equations.

She is currently co-chairing a working group on Communities and Virtual Organizations in the NSF’s Office of Cyberinfrastructure Task Force on Grand Challenge Communities. She is a Science Commons fellow, a member of the Sigma Xi scientific research society, and the AAAS. She has previously been a postdoc with Eric von Hippel’s Innovation and Entrepreneurship Group at MIT’s Sloan School of Management, and a research fellow at the Berkman Center at Harvard Law School. She has taught quantitative methods as a Lecturer in Law at Stanford Law School, as well as statistics and computing at Stanford University, the University of California at Berkeley, and San Jose State University. She was a summer extern at the U.S. Court of Appeals for the Ninth Circuit with Chief Judge Kozinski and served as Managing Editor of the Stanford Law and Policy Review in 2007. She has been a summer intern at (formerly Xerox PARC) and IBM’s T. J. Watson Research Labs. Her webpage, including talks and publications, is and she occasionally blogs at .

O'Reilly Media Logo techweb
  • Adobe Systems, Inc.
  • Booz Allen Hamilton
  • ESRI
  • Microsoft
  • Palantir Technologies
  • Google
  • Oracle
  • EffectiveUI
  • EveryBlock
  • Intuit QuickBase
  • JackBe
  • MarkLogic
  • NetApp
  • NIC
  • OpenText
  • Sapient
  • Spigit
  • Synteractive/Smartronix
  • Microsoft
  • Government Executive Media Group
  • Nextgov

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Rob Koziura at or download the Gov 2.0 Expo Sponsor Prospectus

Media Partner Opportunities

For media partnerships, contact Jaimey Walking Bear (707) 827-7176

Program Ideas

Send an email to

Press and Media

For media-related inquiries, contact Maureen Jennings at or Natalia Wodecki at

Contact Us

View a complete list of Gov 2.0 Events contacts.