Processing huge volume event streams in realtime in a robust and efficient fashion poses quite some challenges. Throwing raw processing power at the problem is one way to solve them, but there are more efficient ways, in particular when the specific analysis task focusses on interesting points or allows to deal with approximate results. In this talk we’ll cover what we call realtime data analysis patterns, covering all aspects from data acquisition, processing, to storage of historic data, always making sure that the resulting system can provide constant performance. The resulting architecture uses approximative algorithms at its core, and uses a combination of in-memory and disk based storage. We deal with such questions like: How to make sure we can ingest several 10k events per second? How to keep track of millions of objects with bounded resources? How to integrate with existing infrastructure? Finally, we will discuss several use cases, including social media data, user realtime profiling and recommendation, and realtime analytics.
Mikio Braun is co-founder of Streamdrill, a startup focused on approximative approaches for real real-time big data. He holds a Ph.D. in Machine Learning and has worked in research for a number of years, before becoming interested in putting research results into good use in the industry. His current interests focus on anything to do with real-time data analysis, in particular using approximative approaches beyond scaling.
For exhibition and sponsorship opportunities, email stratahadoop@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
View a complete list of Strata + Hadoop World contacts
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • conf-webmaster@oreilly.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.