Presented By O’Reilly and Intel AI
Put AI to Work
April 29-30, 2018: Training
April 30-May 2, 2018: Tutorials & Conference
New York, NY

Adversarial ML: Practical attacks and defenses against graph-based clustering

Yacin Nadji (Georgia Institute of Technology)
2:35pm–3:15pm Tuesday, May 1, 2018
Implementing AI, Models and Methods
Location: Concourse A
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Machine learning engineers

Prerequisite knowledge

  • A basic understanding of evaluating a ML supervised model and graphs

What you'll learn

  • Explore the weaknesses of machine learning in adversarial environments
  • Learn how to adversarially evaluate an ML system
  • Understand the concept of knowledge level of adversaries in ML contexts

Description

As machine learning (ML) is increasingly used in security, practitioners and researchers must understand the pitfalls ML presents in the adversarial context. Attackers currently evade signatures and heuristics, and they evade statistical models too. Yacin Nadji offers some background on the academic security world’s attempt at understanding how to break and fix ML systems, which inevitably devolves into the cat-and-mouse game seen in many facets of security. However, those that can find mice better will stay cats longer.

Yacin begins by providing a high-level background of the adversarial machine learning space before walking you through tearing down, evaluating, and fixing a deployed network-based domain-name generation algorithm detector that uses graph clustering. While the described attacks and fixes are specific to graph clustering, the process used can be applied to other ML systems to perform adversarial evaluation. Novel contributions include evaluating unsupervised graph learning as well as considering the level of knowledge an attacker possesses, which is paramount when ML systems rely on a nonlocal feature space. Consider an ML system that extracts features from a large ISP’s network traffic to detect infected hosts. An adversary that only knows the network traffic of their infections is less equipped to evade this system than an attacker that has compromised the ISP’s training dataset from which the features are constructed. Prior work often considers an attacker that only has black-box access to the model, but the most sophisticated attackers are likely to have reverse engineered the training dataset or surreptitiously acquired it through illegitimate means. Threat models for ML systems must include these sophisticated attackers if they are to remain relevant.

Photo of Yacin Nadji

Yacin Nadji

Georgia Institute of Technology

Yacin Nadji is a research scientists at the Georgia Institute of Technology. An expert in computer security, he has worked at numerous companies building and improving machine learning-based fraud and abuse detection systems at scale. Yacin is the author of 16 academic publications with over 600 citations, has served as a reviewer for academic security conferences and journals, and has given talks at several industry conferences and symposia. He holds a PhD in computer science from the Georgia Institute of Technology.