Recent years have seen significant evolution of the Internet of Things (IoT). It has become increasingly easy to connect devices to the Internet and send readings to the public cloud in real time. However, it’s quite evident that the adoption of IoT platforms within the enterprise is lagging and less prevalent, due, in part, to the challenges of an on-premises deployment, the lack of related infrastructure, and in some cases, the lack of domain expertise required to deploy analytical solutions that demonstrate high value through real-life use cases.
Intel IT has implemented a platform that enables near-real-time stream analytics at scale, using Docker and Core OS for on-premises deployments. Unlike other IoT analytics implementations, which settle for basic statistics or make many assumptions on the collected data, usually relevant only to specific domains, Intel implemented a generic analytics layer that uses machine-learning techniques and advanced statistical tests to provide meaningful insights to enterprise users in different use cases and business domains.
Moty Fania shares Intel’s IT experience implementing this on-premises big data IoT platform for internal use cases. Intel’s ultimate goal was to allow users and organizations in the enterprise to gain insights and business value while hiding the complexity and avoiding the tremendous efforts in building, maintaining, and stabilizing such systems. For this purpose, Intel developed a “smart data pipe,” which allows analysis of massive data streams from many devices simultaneously in near real time. Based on several open source technologies including Kafka, Spark, and Hadoop (HBase), this platform is optimized to be portable and easily deployed with Docker and Core OS with the click of a button.
To enable stream analytics at scale, Intel implemented a sophisticated stream-processing framework, Pigeon, based on Akka. This model enables us to build a distributed, multithreaded system without the complexities of traditional multithreaded programming. Pigeon implements a cluster capable of processing channels named topologies that process the data according to any arbitrary logic determined by the users. It handles the creation of topologies, balancing them across the cluster, and allows nodes to join or leave dynamically. In the event that one or more topologies fail, they are automatically recovered without any human intervention.
Intel also provides an analytical layer on top of this smart data pipe that yields meaningful insights for diverse users and address situations where there is no prior knowledge about the time series data before it starts arriving and there is no way to assume “normal” behavior or distribution. This analytical layer includes four main components:
Moty shares the results when the presented analytic capabilities were applied to several datasets—both internally and externally. For example, in one POC, they helped identify a fab tool which caused a yield problem; in another POC, they showed malfunctions of electrical network voltage-sensors. This is a promising direction for enabling the anticipated IOT analytics revolution in the enterprise.
Moty Fania is a principle engineer for big data analytics at Intel IT, where he drives the overall technology and architectural roadmap and owns development and architecture. Moty has over 13 years of experience in BI, data warehousing, and decision-support solutions. He holds a bachelor’s degree in computer science and economics and a master’s degree in business administration from the Ben-Gurion University in Israel.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.