Sketching Techniques for Real-time Big Data

Beyond Hadoop Great America Ballroom J
Average rating: ****.
(4.25, 4 ratings)

In many modern web and big data applications the data arrives in a streaming fashion and needs to be processed on the fly. In these applications, the data is usually too large to fit in main memory, and the computations need to be done incrementally upon arrival of new pieces of data. To do these computations, sketches of the data are designed and used that not only take a small amount of memory but also allow for fast queries and updates on the fly. Such sketches are useful both in applications run on a single machine and for applications run on distributed systems such as Twitter Storm. We will present the techniques used to design these sketches and also provide a number of examples, such as frequent item-sets (used for e.g. retail product recommendations), clustering, and heavy hitters (used for e.g. fraud and intrusion detection), etc. to clarify the techniques and how to apply them.

Photo of Bahman Bahmani

Bahman Bahmani

Rakuten

Bahman Bahmani is a director of data science at Rakuten (the seventh largest internet company in the world), managing an AI organization with engineering and data science managers, data scientists, machine learning engineers, and data engineers, globally distributed across three continents, and in charge of the end-to-end AI systems behind the Rakuten Intelligence suite of products. Previously, Bahman built and managed engineering and data science teams across industry, academia, and the public sector in areas including digital advertising, consumer web, cybersecurity, and nonprofit fundraising, where he consistently delivered substantial business value. He also designed and taught courses, led an interdisciplinary research lab, and advised theses in the computer science department at Stanford University, where he also did his own PhD focused on large-scale algorithms and machine learning, topics on which he’s a published author.

Comments on this page are now closed.

Comments

Picture of Bahman Bahmani
Bahman Bahmani
02/28/2013 3:44pm PST

You can find the slides on the talk’s webpage: http://strataconf.com/strata2013/public/schedule/detail/27311

Mario Brenes
02/27/2013 2:01pm PST

Where can I get your presentation materials?

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts