Over the past year, Spark Streaming has emerged as the leading platform to implement IoT and similar real-time use cases. There are successful implementations across a diverse spectrum of industries: consumer internet and mobile, to healthcare to traditional manufacturing.
We will start with a brief introduction to Spark Streaming’s micro-batch architecture for real-time stream processing. However, the primary focus of the talk will be on end-to-end architectures and use cases. We will give a walkthrough, and live demo, of an example use case that includes processing and alerting on-time series data (such as sensor data); all the way from ingestion of the time series data streams with Kafka, processing in Spark Streaming to identify egregious conditions, and sending alerts via Kafka events.
Alerting and visualization often go together. After all, when something goes wrong, the investigation entails visualizing relevant events and metrics. We will extend our architecture by showing how the time series output of Spark Streaming can be written to HBase or OpenTSDB, so that it can be served to a front end for visualization.
In addition to the above use use case, we will highlight some of the high-level operators and libraries available in Spark Streaming that make it easy to implement IoT use cases:
We will share some pro tips for:
Last, we will describe how to monitor your long-running streaming applications, and highlight some recent and upcoming improvements in monitoring.
Hari Shreedharan is a software engineer at Cloudera, an Apache Flume committer/PMC member, and a Spark contributor. He is the author of the O’Reilly Media book Using Flume.
Anand Iyer is a senior product manager at Cloudera, the leading vendor of open source Apache Hadoop. His primary areas of focus are platforms for real-time streaming, Apache Spark, and tools for data ingestion into the Hadoop platform. Before joining Cloudera, Anand worked as an engineer at LinkedIn, where he applied machine-learning techniques to improve the relevance and personalization of LinkedIn’s Feed. Anand has extensive experience leveraging big data platforms to deliver products that delight customers. He holds a master’s in computer science from Stanford and a bachelor’s from the University of Arizona.
Comments on this page are now closed.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.