Algebra for Scalable Analytics

Abstractions are what enable us to think clearly about complex systems. In this talk, we will see how some simple abstractions, such as Monoids, can be used to pattern analytics platforms. We will look at several interesting cases from the Algebird project, such as Bloom-Filters, HyperLogLog, Count-Min Sketch, Min-Hash. We will see some example computations, how to they are run on scalding and summingbird, and finally discuss some open problems in algebraic algorithm design.

Oscar Boykin is a staff data scientist at Twitter, co-author of Algebird, Scalding and Summingbird.