Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams) for “micro batch” computations on small time intervals.
In this talk we will compare several published case studies for production deployments of Spark Streaming, based on interviews with the development teams. We will also compare and contrast other approaches to streaming at scale, such as Google’s MillWheel case study, Storm at Twitter, S4 at Yahoo! and Nokia, etc.
One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc. This talk will present an open source example of integrating Spark Streaming, Spark SQL, and Tachyon within a single app for real-time machine learning updates.
O’Reilly author (Enterprise Data Workflows with Cascading and the new Just Enough Math) and a “player/coach” who’s led innovative Data teams building large-scale apps. OSS evangelist for Apache Spark (Databricks), workshop instructor (Global Data Geeks), advisor to Zettacap, Amplify Partners, The Data Guild. Expert in machine learning, cluster computing, and Enterprise use cases for Big Data. Interests: Spark, Mesos, PMML, Open Data, Cascalog, Scalding, Python for analytics, NLP.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email email@example.com
For information on trade opportunities with O'Reilly conferences, email firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata + Hadoop World contacts
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.