Network packet broker hardware is one way to acquire network monitoring data at scale for on-premises intrusion detection. Deployment of this kind of hardware is easy to understand. However, the result is a highly concentrated network capture source. Thus, the next challenge in developing an intrusion detection system becomes finding the tiny amount of relevant information in a very large stream—and doing so efficiently.
Jeff Henrikson presents a data pipeline for digesting useful analytics for intrusion detection from aggregated PCAP, with an emphasis on its highest throughput stage: conversion of PCAP to a netflow-like format. The main building blocks for the system are libpcap, Kafka, Scala, Akka, and Docker. The pipeline runs efficiently at 10 GB a second with end-to-end latency of two minutes and processes streams without approximation. Any individual node can be removed from the system without disruption. Jeff shows how the upfront design compared to the final design and shares experience with the building blocks that the team discovered along the way.
Jeff Henrikson is a software consultant in the area of cybersecurity with 15 years of experience in data science and data engineering. Previously, Jeff worked at Amazon on the retail website page with the highest revenue per impression; productionized Intentional Software’s first product and coauthored the first reference manual with founder Charles Simonyi; and worked in computer vision, insurance catastrophe modeling, and manufacturing science. Jeff holds degrees in math from MIT and jazz composition from Berklee College of Music. Last spring, Jeff created and taught the course Building the Data Pipeline for the Big Data Certificate program offered by University of Washington Professional and Continuing Education.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com