Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Ram Shankar Siva Kumar (Microsoft (Azure Security Data Science)), Andrew Wicker (Microsoft (Azure Security Data Science))
11:00am11:40am Wednesday, March 15, 2017
Platform Security and Cybersecurity
Location: LL21 B Level: Intermediate
Secondary topics:  Cloud
Average rating: ****.
(4.50, 4 ratings)

Who is this presentation for?

  • Data scientists working in security, security analysts, and ML engineers

What you'll learn

  • Explore the challenges that compliance/governance introduces in model design
  • Learn six strategies and trade-offs to get over the "cold start” problem of having no labelled data during model evaluation
  • Gain exposure to generative adversarial networks and understand their relevance in security settings
  • Learn how to scale anomaly detections systems for the cloud
  • Discover how to define meaningful metrics to assess if a given security data science solution actually meets business needs


In most security data science talks that describe a specific algorithm used to solve a security problem, the audience is left wondering: how did they perform system testing when there is no labeled attack data; what metrics do they monitor; and what do these systems actually look like in production? Academia and industry both focus largely on security detection, but the emphasis is almost always on the algorithmic machinery powering the systems. Prior art productizing solutions is sparse: it has been studied from a machine-learning angle or from a security angle but has not been jointly explored. But the intersection of operationalizing security and machine-learning solutions is important not only because security data science solutions inherit complexities from both fields but also because each has unique challenges—for instance, compliance restrictions that dictate data cannot be exported from specific geographic locations (a security constraint) have a downstream effect on model design, deployment, evaluation, and management strategies (a data science constraint).

Ram Shankar Siva Kumar and Andrew Wicker explain how to operationalize security analytics for production in the cloud, covering a framework for assessing the impact of compliance on model design, six strategies and their trade-offs to generate labeled attack data for model evaluation, key metrics for measuring security analytics efficacy, and tips to scale anomaly detection systems in the cloud. Ram and Andrew explore lessons learned in taking a prototype security analytics system and productizing it with help from teams across Microsoft in a variety of roles, from security analysts in Azure Cloud Security and researchers in Microsoft Research to applied ML engineers in Azure Security Data Science and service engineers on the Service and Reliability team.

Ram and Andrew begin with a focus on the impact of compliance on model design, discussing the balkanization of the cloud (i.e., how certain countries have strict laws against importing data across borders and their effects on model design). The first problem that data scientists will encounter is that it now becomes very difficult to identify macro trends because of fractured data. Ram and Andrew propose tiered model building wherein local models are built in the respective national clouds along with a global model that is only informed of the output of the local models, respecting compliance and privacy notions. Ram and Andrew then explain how to evaluate a security data science system when there is no attack data. You’ll learn techniques to generate attack data like using common attacker tools, red teaming, threat intelligence feeds, and cross-product pollination to verify if the system works and the inherent trade-offs between the different strategies. Ram and Andrew also cover the relevance of generative adversarial networks, a new technique in deep learning that can potentially provide higher-quality samples than sampling techniques like SMOTE. Ram and Andrew conclude with a discussion on model management, focusing on autoscaling the system, illustrated using a case study in detecting anomalous user behavior in SharePoint.

Photo of Ram Shankar Siva Kumar

Ram Shankar Siva Kumar

Microsoft (Azure Security Data Science)

Ram Shankar is a security data wrangler in Azure Security Data Science, where he works on the intersection of ML and security. Ram’s work at Microsoft includes a slew of patents in the large intrusion detection space (called “fundamental and groundbreaking” by evaluators). In addition, he has given talks in internal conferences and received Microsoft’s Engineering Excellence award. Ram has previously spoken at data-analytics-focused conferences like Strata San Jose and the Practice of Machine Learning as well as at security-focused conferences like BlueHat, DerbyCon, FireEye Security Summit (MIRCon), and Infiltrate. Ram graduated from Carnegie Mellon University with master’s degrees in both ECE and innovation management.

Photo of Andrew Wicker

Andrew Wicker

Microsoft (Azure Security Data Science)

Andrew Wicker is a machine learning engineer in the Security division at Microsoft, where his current work focuses on researching and developing machine-learning solutions to protect identities in the cloud. Andrew’s previous work includes developing machine-learning models to detect safety events in an immense amount of FAA radar data and working on the development of a distributed graph analytics system. His expertise encompasses the areas of artificial intelligence, graph analysis, and large-scale machine learning. Andrew holds a BS, an MS, and a PhD in computer science from North Carolina State University.