Recent years have seen significant evolution of the Internet of Things. It has become increasingly easy to connect devices to the Internet and send sensorial data to the public cloud. However, it’s quite evident that the adoption of IoT platforms and stream analytics within the enterprise is lagging and less prevalent, due in part to companies’ lack of expertise and skills required to deploy an on-premises platform and demonstrate high value through various, real-life use cases.
Moty Fania shares Intel’s IT experience implementing an on-premises IoT platform for internal use cases. The platform was based on open source big data technologies and containers and was designed as a multitenant platform with built-in analytical capabilities. Moty highlights the key lessons learned from this journey and offers a thorough review of the platform’s architecture.
Intel IT’s goal was to allow users and organizations in Intel to gain insights and business value from real-time analytics and become more proactive. Intel deployed a platform based on several open source technologies, including Akka, Kafka, and Spark Streaming, with a full stack of algorithms such as multisensor change detection, anomaly detection, and more. Unlike other IoT analytics implementations that settle for basic statistics or make many assumptions on the collected data, Intel’s implementation includes a generic analytics layer that uses machine learning and advanced statistical tests to provide meaningful insights to users in different use cases and business domains.
Moty outlines Intel’s “smart data pipe”/stream processing framework, Pigeon, which enables stream analytics at scale. Pigeon, based on Akka, implements a cluster capable of processing topologies that process the data according to any arbitrary logic determined by the users. It handles the creation of topologies, balancing them across the cluster, and allows nodes to join or leave dynamically. Pigeon is optimized to be easily deployed with Docker and Core OS and cut down development by enabling a single developer to deploy a massive real-time, elastic processing cluster with a click of a button. Spark Streaming was used to deploy self-service data monitors that allow users define their own rules and get an actuation when a certain condition is met. These user-defined rules are monitored in near-real-time on the stream.
Moty then explains how Pigeon and its analytics capabilities were applied to several use cases—both internally and externally—with interesting results. In one POC, Pigeon helped identify a fab tool causing a yield problem; in another POC it showed malfunctions of electrical network voltage sensors. Moty concludes by exploring how operational activities can be “translated” into IoT stream analytics scenarios to allow a higher level of proactivity and a shift from manual monitoring and firefighting to higher-value work.
Moty Fania is a principal engineer and the CTO of the Advanced Analytics Group at Intel, which delivers AI and big data solutions across Intel. Moty has rich experience in ML engineering, analytics, data warehousing, and decision-support solutions. He led the architecture work and development of various AI and big data initiatives such as IoT systems, predictive engines, online inference systems, and more.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.