Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Data and privacy at scale at Wikipedia

Nuria Ruiz (Wikimedia)
11:20am–12:00pm Thursday, 09/13/2018
Law, ethics, governance, Strata Business Summit
Location: 1E 12/13 Level: Beginner
Secondary topics:  Ethics and Privacy

Who is this presentation for?

  • Everyone will find value in this session.

Prerequisite knowledge

  • Familiarity with the big data ecosystem, HTTP, and how the web works in general

What you'll learn

  • Explore creative ways to calculate metrics in a privacy-conscious way
  • Understand the values of the open knowledge movement


The Wikipedia ecosystem is unique in many ways. It is a top 10 website but not a business. It provides content that you use and trust, yet no one is paid to produce it. Its scale of operation is massive, yet it does not run on any major cloud provider. The configuration of the production stack is public, and the software that powers all of its systems is open source.

A perhaps less well-known fact that makes Wikipedia special is the Wikimedia community’s stance in regards to privacy. The community feels strongly that users and editors have a right to anonymity. It’s a core belief that you shouldn’t have to provide personal information to participate in the free knowledge movement. Yet we live in an era in which any action you take online can be measured and tracked. The Wikimedia Foundation tries to address the concerns of how it collects and uses the information to the Wikimedia projects in the WMF Privacy Policy, which was drafted in consultation with the community and whose discussion totaled 195,000 words (making it longer than The Fellowship of the Ring).

Nuria Ruiz discusses the challenges that this strong privacy stance poses for the Wikimedia Foundation, including how it affects data collection, aggregation, and preservation practices, and details some creative workarounds that allow WMF to calculate metrics in a privacy-conscious way.

Photo of Nuria Ruiz

Nuria Ruiz


Nuria Ruiz is a full stack engineer on the analytics team at the Wikimedia Foundation. Before being part of the awesome project that is Wikipedia, she spent time working in JavaScript, performance, mobile apps, and web frameworks. Most of her experience deploying large applications comes from the seven years she worked at Amazon. Nuria is a physicist by trade and started writing software 15 years ago in a physical oceanography lab in Seattle.