High-volume event streams (traditional network data, media, IoT sensor data, activity events on social media, etc.) are becoming widespread in the telecom industry. In particular, live analysis of telco log files and performance metrics allows network operators to observe the status of the system and identify possible problems using online aggregations and machine-learning algorithms. (Offline batch analysis of streams using tools like MapReduce is often too slow to respond to things happening right now; hence, it is not the best choice.)
Ignacio Manuel Mulas Viela and Nicolas Seyvet demonstrate an analytics pipeline setup for a telco use case that processes an unbounded dataset of logs and performance metrics. Raw data, logs, and cloud telemetry information are extracted from a production cloud infrastructure using Collectd, Openstack Ceilometer, and Logstash. This is piped into a distributed messaging system, Kafka, then analyzed by Apache Flink—a distributed stream analysis framework that is capable of analyzing thousands of messages per second, extracting insights that can be monitored by humans—and visualized using the ELK (Elasticsearch, Logstash, Kibana) stack.
Ignacio and Nicolas discuss the challenges and benefits of building an analytics pipeline following the Kappa architecture paradigm using the aforementioned tools and demonstrate Kappa’s value through an example use case. The use case analyzes and extracts statistical information from a stream of data and uses machine-learning techniques to develop an advanced anomaly detector, using two online machine-learning algorithms implemented on top of Flink: the online k-means detector and the Bayesian detector.
Ignacio Mulas is a researcher working in the area of cloud analytics at Ericsson Research. He is an experienced software engineer in cloud and data scientist. Lately, Ignacio has been interested in the development of streaming analytics pipelines following the Kappa architecture and their applicability to industrial use cases.
Nicolas Seyvet is a passionate software developer at Ericsson AB. Nicolas has worked on a wide range of telco-grade (high-availability, scalable, redundant) applications for the telecom/multimedia business and is experienced in Java/JEE (10+ years) and C/C++ (7+ years) and with databases (SQL, NoSQL). He joined Ericsson Research to work on big data, the cloud, and analytics and built OpenStack and Hadoop/Spark clusters as well as some algorithms for RT data. Nicolas’s particular interests are coding, software engineering, software architecture, distributed and scalable systems, distributed processing, lean/agile methodologies, and the principles of good leadership. He specializes in software design and architecture of complex, high-performance systems, as well as leading high-performing cross-functional teams.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.