There are two large obstacles to collecting metadata from a network as large as Vodafone’s (the UK’s second-largest telecoms provider): transporting the sheer volume of data (cumulative bandwidth) and processing it before the data no longer accurately reflects the state of the network (cumulative delay). Fortunately, combining Apache Flume and Apache Kafka using the Flafka pattern provides a means to move data into the EDH (Hadoop cluster) and readily scale the pipeline to address both transient and persistent spikes in data volume.
Flume and Kafka are both capable of high-performance, low-latency event processing; however, careful tuning is required in order to achieve performance at this scale. Vodafone has deployed Flume and Kafka across the UK network in a geographically distributed architecture that achieves scale and resilience, having been tuned from around 10,000 events per second on initial deployment to 1,000,000 events per second using a three-node Kafka cluster. Tristan Stevens discusses the architecture, deployment, and performance-tuning techniques that enable the system to perform at IoT-scale on modest hardware and at a very low cost.
Tristan Stevens is a senior solutions architect at Cloudera, where he helps clients across EMEA with their Hadoop implementations. Tristan’s background is in the UK defence sector. He has also worked on large-scale, highly available, business-critical analytics platforms, with more recent experience in gaming, telecoms, and financial services.
Comments on this page are now closed.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org