October 30–31, 2016: Training
October 31–November 2, 2016: Tutorials & Conference
New York, NY

TRAINING: Foundations of security data science (Day 2)

Jay Jacobs (BitSight Technologies), Bob Rudis (Rapid7)
9:00am–5:00pm Monday, 10/31/2016
Location: Concourse E

Join Jay Jacobs, Charles Givre, and Bob Rudis for a hands-on, in-depth exploration into the foundations of security data science. You’ll learn how to explore and analyze data you probably already have and gain valuable exposure to and experience with tools and techniques to prepare, analyze, and visualize the knowledge hiding in your data. Jay, Charles, and Bob guide you through working with three hands-on, practical applications with real data, introducing each in a language-agnostic approach before providing language-specific guidance for hands-on work. A GitHub repository with the examples will be available so that you can revisit the examples and continue learning after the training.

If you are a security analyst and need to leverage more data in your analyses, are working in operations and know you can pull out more from the data you have, or already identify vulnerabilities and weaknesses in systems and networks but need to better communicate your team’s findings during engagements, this is the training for you.

Day 2 Outlne

Project showcase from Day 1 (30 minutes)

  • Instructors kick off the day by reviewing some concepts from the previous day and showcase some work from the day before.
  • Participants will see and discuss the strengths and opportunities for the analysis done by other participants.

Core clustering and unsupervised learning (60 minutes)

  • Supervised versus unsupervised learning
  • Unsupervised learning: What is it, how it works, when to use it, and some typical use cases for applying it
  • Specific unsupervised techniques and how they work (language-specific implementations provided as examples)
  • The importance of and techniques for feature generation and the role of domain expertise
  • Introduction to the dataset and the question we need to answer

Vulnerability data challenge—hands-on lab (90 minutes)

  • Instructors will provide a real-world dataset and the challenge.
  • Participants will prepare and explore the data and develop a research question to answer. (Key questions will be provided that can be answered in the time allotted, but participants can identify additional ones if they have existing knowledge about vulnerability management.) You can submit your work to the training GitLab instance.

Lunch break

Morning wrap-up (30 minutes)

  • Instructors review concepts from the morning and showcase some work from the participants.
  • Participants will see and discuss the strengths and opportunities of the analysis done by other participants.

Core classification and supervised learning (60 minutes)

  • Supervised learning: What is it, how it works, when to use it, and some typical use cases for applying it
  • Random forests and how they work (language-specific implementations provided as examples)
  • Discussion about the importance and techniques for feature generation and the role of domain expertise
  • Introduction to the dataset and the question we need to answer

Domain-generating algorithms—hands-on lab (90 minutes)

  • Instructors will provide a real-world dataset and the challenge.
  • Participants will prepare and explore the data, generate features, and do some supervised learning on the data. You can submit your work to the training GitLab instance.

Course wrap-up (30 minutes)

  • Instructors will review the material covered in the afternoon session and then conclude by discussing the big picture and how the techniques you learned will help you with the work ahead, with heavy emphasis placed on continued learning.
Photo of Jay Jacobs

Jay Jacobs

BitSight Technologies

Jay Jacobs is the senior data scientist at BitSight Technologies. Previously, Jay spent four years as the lead data analyst for the Verizon Data Breach Investigations Report. Jay is the coauthor of Data-Driven Security, which covers data analysis and visualizations for information security, and hosts the Data-Driven Security and R World News podcast. Jay is also a cofounder of the Society of Information Risk Analysts and currently serves on its board of directors. Jay is active in the R community; he coordinates his local R user group for the greater Minneapolis area and contributes to local events and functions supporting data analysis.

Photo of Bob Rudis

Bob Rudis

Rapid7

Bob Rudis has over 20 years of experience using data to help defend global Fortune 100 companies. Bob is currently (master) chief security data scientist at Rapid7. He was formerly a security data scientist and managing principal at Verizon, overseeing the team that produces the annual Data Breach Investigations Report. Bob is a serial tweeter, an avid blogger, the author of Data-Driven Security, a speaker, and a regular contributor to the open source community. He currently serves on the board of directors for the Society of Information Risk Analysts, is on the editorial board of the SANS Securing the Human program, and was cochair of the 2014 Metricon security metrics/analytics conference. Bob was chosen as one of SANS’s People Who Made a Difference in Security in 2015 and holds a bachelor’s degree in computer science from the University of Scranton.