October 30–31, 2016: Training
October 31–November 2, 2016: Tutorials & Conference
New York, NY

Detecting anomalies efficiently at scale: A cybersecurity streaming data pipeline using Kafka and Akka clustering

Jeff Henrikson (Groovescale)
2:10pm–2:50pm Tuesday, 11/01/2016
Security in context (security datasci)
Location: Rendezvous Trianon
Average rating: ***..
(3.75, 4 ratings)

What you'll learn

  • Explore a data pipeline for digesting useful analytics for intrusion detection from aggregated PCAP, with an emphasis on its highest throughput stage: conversion of PCAP to a netflow-like format

Description

Network packet broker hardware is one way to acquire network monitoring data at scale for on-premises intrusion detection. Deployment of this kind of hardware is easy to understand. However, the result is a highly concentrated network capture source. Thus, the next challenge in developing an intrusion detection system becomes finding the tiny amount of relevant information in a very large stream—and doing so efficiently.

Jeff Henrikson presents a data pipeline for digesting useful analytics for intrusion detection from aggregated PCAP, with an emphasis on its highest throughput stage: conversion of PCAP to a netflow-like format. The main building blocks for the system are libpcap, Kafka, Scala, Akka, and Docker. The pipeline runs efficiently at 10 GB a second with end-to-end latency of two minutes and processes streams without approximation. Any individual node can be removed from the system without disruption. Jeff shows how the upfront design compared to the final design and shares experience with the building blocks that the team discovered along the way.

Photo of Jeff Henrikson

Jeff Henrikson

Groovescale

Jeff Henrikson is a software consultant in the area of cybersecurity with 15 years of experience in data science and data engineering. Previously, Jeff worked at Amazon on the retail website page with the highest revenue per impression; productionized Intentional Software’s first product and coauthored the first reference manual with founder Charles Simonyi; and worked in computer vision, insurance catastrophe modeling, and manufacturing science. Jeff holds degrees in math from MIT and jazz composition from Berklee College of Music. Last spring, Jeff created and taught the course Building the Data Pipeline for the Big Data Certificate program offered by University of Washington Professional and Continuing Education.