The Wikimedia Foundation (WMF) is a nonprofit charitable organization. As the parent company of Wikipedia, one of the most visited websites in the world, WMF faces many unique challenges around its ecosystem of editors, readers, and content. Andrew Otto and Fangjin Yang explain how the WMF does analytics and offer an overview of the technology it uses to efficiently process pageviews that at peak run at about 200,000 reqs/sec.
Many folks may not realize the WMF has a dedicated data analytics team that is responsible for building out the foundation’s logging and data mining infrastructure—and for making Wikimedia-related statistics available to the other teams at the foundation and, perhaps more importantly, to the world at large. Analytics tracking in the Wikimedia movement started with measuring article and editor counts and has grown to support various metrics, summary formats, and visualizations. As the analytics capabilities grow more sophisticated, they play an increasingly important role in helping guide decisions.
One of the technologies the foundation leverages for its analytics is the Druid open source project, a column-oriented distributed database. Andrew and Fangjin cover Druid’s architecture and use cases and explain how it has complemented workflows at WMF.
Andrew Otto is a systems engineer at the Wikimedia Foundation, where he supports the analytics team by architecting and maintaining small and big data analytics infrastructure. Previously, Andrew was the lead systems administrator at CouchSurfing.org. He is based in Brooklyn, NY, and spends too much time playing hardcourt bike polo.
Fangjin Yang is a coauthor of the open source Druid project and a cofounder of Imply, a data analytics startup based in San Francisco. Previously, Fangjin held senior engineering positions at Metamarkets and Cisco Systems. Fangjin has a BASc in electrical engineering and an MASc in computer engineering from the University of Waterloo, Canada.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com