San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Masquerading malicious DNS traffic

David Rodriguez (Cisco Systems)

11:50am–12:30pm Thursday, March 28, 2019

Data Science, Machine Learning & AI
Location: 2010

Secondary topics: Security and Privacy, Temporal data and time-series analytics

Average rating:

(4.50, 2 ratings)

Download slides (PDF)

Who is this presentation for?

Machine learning engineers, data scientists, and statisticians

Level

Intermediate

Prerequisite knowledge

A basic understanding of Apache Spark, statistics, and probability

What you'll learn

Learn how Cisco uses Apache Spark and Stripe’s Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic)

Description

Masquerading traffic is artificially generated traffic mixed within normal traffic. Detecting this behavior change is often difficult because of the random behavior of network traffic, causing most unsupervised and supervised statistical modeling to fail.

David Rodriguez explains how Cisco performs large-scale Bayesian inference on DNS logs to uncover masquerading traffic in count data, representing the number of requests from tens of millions of stub IPs made to hundreds of millions of domains. Using novel mixtures of common discrete distributions, or hidden Markov processes, the company models some of the most sporadic network traffic volumes to domain names. From zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) distributions and their more generalized forms, it models the gaps in requests as if they were just as important as the requests themselves, teasing out underlying changes in request patterns.

The company then combines Apache Spark and Stripe’s Rainier to distribute and perform Bayesian modeling, running thousands of simulations (using MCMC methods), to fit the underlying requester patterns. David demonstrates how the parameters to these models offer insights into changes that aren’t easily discerned by eye. Only with hundreds of thousands of simulated and archived traffic patterns associated with benign and malicious network traffic can you begin to unravel how to reduce false alarms and effectively monitor evolving online threats and masquerading malicious traffic.

Topics include:

The latest advances in Bayesian inference on the JVM using Stripe’s open source Rainier project
How to scale Bayesian inference to internet-scale datasets using Apache Spark
How to build time-dependent risk and severity metrics identifying network anomalies
How to model sporadic network traffic using discrete probability distributions
How to build hidden Markov models (HMMs) capturing idle and active states of network traffic
How to use Markov chain Monte Carlo (MCMC) methods

David Rodriguez

Cisco Systems

David Rodriguez is a senior research engineer at Cisco Umbrella (formerly OpenDNS). He has coauthored multiple pending patents with Cisco in distributed machine learning applications centered around deep learning and behavioral analytics. He’s a frequent speaker about machine learning in cybersecurity at conferences including Flink Forward, Black Hat, Flocon, Virus Bulletin, and HitBSEC. David holds an MA in mathematics from San Francisco State University.

Website

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com