Skip to main content

Making Big Data Small

Baron Schwartz (VividCortex)
Data-Driven Business Beekman Parlor -- Sutton North
Average rating: ****.
(4.80, 5 ratings)
Slides:   external link

Today it almost seems fashionable to capture, store, and process “everything,” because we can. But there’s a real cost to this approach — and in many cases, the ultimate goal might be served nearly as well by a Small Data mindset and worldview.

In this session I will share my tricks for reducing a lot of problems from a Big Data, Big Compute solution to a comparatively small and cheap approach instead. The savings can be as big as you want them to be, including “infinite” (yes, with air-quotes) in some cases. Not every problem is amenable to this kind of solution, but many are.

In general, data collection, storage, retrieval, and processing can all be characterized by the cost and resources required for storage, bandwidth, and computation. Each of these often offers opportunities for a cost-versus-accuracy tradeoff. Consider Bloom Filters, for example, which answer a yes-no question with either “probably yes” or “definitely no” and are extremely cheap relative to the cost of a “definitely yes/no” answer.

If you’re not familiar with Bloom Filters, I’ll cover that, as well as a variety of other techniques, such as exponential moving averages, discarding strong correlates, pre-filtering, sparse collection and storage, histograms, statistical metrics, sampling, and modeling. Each of these offers a tradeoff that’s worth considering.

In addition, I’ll share my general approach to finding Small Data solutions to all kinds of Big Data problems. I don’t have a fancy name for it, but I do have a process that works well for me, and I believe it may be useful to you too.

Photo of Baron Schwartz

Baron Schwartz

VividCortex

Baron Schwartz is founder and CEO of VividCortex, the best way to see what your production database servers are doing. He is the lead author of High Performance MySQL and a variety of open source software.

Comments on this page are now closed.

Comments

Marek Kolodziej
10/30/2013 4:36pm EDT

Would it be possible to post the slides here, like the other speakers have?

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts