As a payments provider, Stripe has a veritable goldmine of data to use — and lots of uses for it, from checkout conversion analysis to fraud prevention. But until recently, that data was stored only in disparate production systems, and what aggregates we did have were very ad-hoc.
We chose to approach this problem iteratively, in order to better understand the requirements and constraints and to explore the different technologies available. With some work (and lots of mistakes), we were able to build a system that streams data into HBase from our production services in real-time, making it available for analytics using MapReduce, Impala and other technologies in the ecosystem.
In my presentation, I’ll discuss the various architectures and technologies we tried, what worked well, and the lessons we learned.
Colin Marc is a developer at Stripe, where he’s recently been spending his building analytics and modeling infrastructure. Besides programming all the things, Colin is also interested in Tuvan throat-singing and iambic tetrameter.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata + Hadoop World 2013 contacts