Training: 8–9 November 2016
Tutorials & Conference: 9–11 November 2016
Amsterdam, NL

Beyond matching: Applying data science techniques to IOC-based detection

Alex Pinto (Niddel)
11:20–12:00 Thursday, 10 November, 2016
Security in context (security datasci)
Location: G104/105 Level: Intermediate
Average rating: ****.
(4.29, 7 ratings)

Prerequisite knowledge

  • Familiarity with the significance and “normal usage” of threat intelligence indicators
  • A basic knowledge of the concepts of network security monitoring
  • No background on statistics necessary

What you'll learn

  • Understand the limitations of IOC matching, based on the lack of exhaustion of the data and the amount of false positives usually involved given the lack of timeliness of the data
  • Learn procedures and techniques on how to extract higher-level information from IOCs by their enrichment relationships, including passive DNS, WHOIS, geolocation, and BGP prefixes, and how that can bring greater insights on the data organizations already have available for detection, no tool purchase necessary
  • Discover how to measure the efficiency and coverage of IOC feeds against organization log data and go beyond the “number of IOCs” metrics on choosing a threat intelligence provider
  • Be able to replicate the results with open source code and apply the concepts described on this talk to small datasets of log data and IOCs (such as the ones gathered by the Combine and TIQ-test tools from the MLSec project)


There is no doubt that indicators of compromise (IOCs) are here to stay. However, at the moment, even the most mature incident response (IR) teams are mainly focused on matching known indicators to their captured traffic or logs. The real eureka moments of using threat intelligence mostly come from the intuition of analysts. You know, the ones that are almost impossible to hire.

Alex Pinto demonstrates how to apply descriptive statistics, graph theory, and nonlinear scoring techniques on the relationships of known network IOCs to log data and how to use those techniques to empower IR teams to encode analyst intuition into repeatable data techniques that can be used to simplify the triage stage and get actionable information with minimal human interaction. Alex also showcases open source tools that can be easily expandable to paid or private sources an organization might have access to.

With these results, you can make IR teams more productive as soon as the initial triage stages by providing them data products that provide a sixth sense on which events are worth an analyst’s time. They also make painfully evident which IOC feeds are helpful to their detection process and which ones are not.

Photo of Alex Pinto

Alex Pinto


Alex Pinto is the chief data scientist of Niddel and the lead for the MLSec Project. Alex is currently dedicating his waking hours to the development of machine learning algorithms and data science techniques to automate threat hunting (I know) and making threat intelligence “actionable” (I know, I know). If you care about certifications at all, Alex is currently a CISSP-ISSAP, CISA, CISM, and PMP. He was also a PCI-QSA for almost seven years but is a mostly ok person in spite of that.