Real-time Big Data Without Streaming

Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Average rating: ****.
(4.50, 2 ratings)

There has been a lot of excitement lately about streaming approaches to handling Big Data such as Storm, S4, SQLStream, and InfoStreams. But many of the use cases for streaming big data can be better handled by integrating more established technologies: batch analytics with Hadoop, and low latency with NoSQL databases, and search indexing.

The integrated Big Data approach relies on local logic in an app server, that reads indexes and NoSQL databases, updates pre-computed scores, writes updates to NoSQL and then responds.

We compare and contrast these two approaches, first identifying patterns for effective integration for Hadoop, NoSQL, and distributed Lucene and then looking at emerging streaming Big Data technologies.

We look at common patterns that require near real-time Big Data and contrast the two approaches:

  • Event responses: when an event occurs a profile needs to be read, updated, and a response decided based on a Machine Learning model. We look at examples such as personalizing web pages and assessing health and generating alerts for device data.
  • Operational intelligence such as for system performance statistics, online sales, or top or trending items
  • Detecting and updating models as new events come in (e.g., for fraud or security attack detection)
  • Complex low latency queries aka Distributed RPC

This survey brings us to some key factors that determine the value of streaming versus more localized low latency approaches:

  • Complex correlation across events
  • Complexity of downstream integration
  • Application logic
Photo of Ron Bodkin

Ron Bodkin

Google

Ron founded Think Big Analytics to help customers leverage new data processing technologies like Hadoop and NoSQL databases and R for statistical analysis. Works with customers to identify opportunities and rapidly develop solutions that integrate data and extract information.

Previously Ron was the VP of Engineering for Quantcast. Each day Quantcast ingests 10 billion events and processes two petabytes of data using Hadoop. The Quantcast map reduce stack handles production data processing, ad hoc analysis, data mining and machine learning. Prior to that Ron was a founder of enterprise consulting companies C-bridge and New Aspects.

Comments on this page are now closed.

Comments

Picture of Shirley Bailes
Shirley Bailes
10/30/2012 7:02am EDT

@Maryanne, @Rod, they have been made available above under “presentation”:bitly.com/Rpm2cd

Picture of Ron Bodkin
Ron Bodkin
10/29/2012 10:14am EDT

Hi Maryanne – I just uploaded them so they should be available by tomorrow.

Maryanne DellaSalla
10/29/2012 10:08am EDT

Are the slides available for this session? Thank you!

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.