Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Architecting data platforms for cybersecurity

Charaka Goonatilake (Panaseer)
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Security and Privacy
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Architects, data and platform engineers, and data scientists

Prerequisite knowledge

  • A general awareness of big data technologies such as Spark and the Hadoop ecosystem
  • Security domain knowledge not required

What you'll learn

  • Learn about key challenges in data-driven cybersecurity
  • Understand how various components from the Hadoop and big data ecosystem can be integrated to create a data platform to support cybersecurity use cases
  • Gain insights from real-world use cases to apply when designing your own data platforms


In today’s world, with cyber incidents reported almost daily, security teams are increasingly turning to data for answers. Data-driven approaches can prove immensely valuable in providing visibility to support decision making and drive action across the entire cybersecurity lifecycle. Data platforms to support security operations teams in their reactive efforts to detect and respond to security incidents have existed for a long time, from SIEMs historically and Apache Metron to Spot. In addition, a whole new class of traditionally underserved stakeholders and use cases is emerging from security executives who need strategic decision support to deliver proactive initiatives that measure and mitigate cyber risks.

Designing successful data solutions for the cybersecurity domain can be a daunting task. The diversity of problems to be solved for various stakeholders in and around a security function leads to an array of complex and potentially competing data and analysis requirements. This complexity initially arises from the need to collect and prepare data of any type from wherever it resides and however it’s exposed. The data must be then stored in a way that can accommodate a range of access patterns. Finally, interfaces must exist to promote wide accessibility to allow the range of platform users to analyze the data and consume insights, taking these users’ varying data analysis skill levels into account.

Once you’ve understood your users and their needs, you face the challenge of navigating the vast sea of data technologies vying for your attention to arrive at a solution. But with the proliferation of open source and proprietary technology options, each with their own trade-offs, how do you deliver a scalable and flexible data platform that will serve your security organization for years to come?

Charaka Goonatilake explores the key drivers that influence the architecture of a cyber data platform and explains how to deliver on these requirements using open source big data technologies like Spark and the Hadoop ecosystem. Charaka walks you through real-life lessons learned and the successes and failures experienced while building and evolving data platforms.

Topics include:

  • Understanding the requirements: Common security user personas and their data use cases
  • Distilling the requirements: The data and workloads that drive the architecture and technology choices
  • Assembling the data platform from core building blocks to collect, store, analyze, and present data
  • Data collection: Rapidly onboard new datasets without having to rely on data engineers writing code, using GUI-driven data flows in NiFi
  • Data storage: The trade-offs in data stores, such as Hive and HBase, to service diverse data access patterns
  • Data analysis: Using Spark as a unifying framework so that data scientists can build data pipelines for ETL and analysis on security data
  • Data presentation: Integrating the toolset to the data platform that will allow your user base to extract and disseminate value from the data
Photo of Charaka Goonatilake

Charaka Goonatilake


Charaka Goonatilake is CTO at Panaseer, where he designs and delivers big data solutions that enable chief information security officers and their teams to gain visibility into the true state of security within their business to improve cyber hygiene and reduce cyber risk exposure. Charaka has been immersed in big data technologies since the very early days of Hadoop and has hands-on experience using Hadoop in the enterprise to produce data-driven insights. Over the past eight years, across Panaseer and BAE Systems Applied Intelligence, Charaka has architected and engineered Hadoop-based data platforms for a range of cybersecurity use cases, from security analytics for threat detection to threat intelligence management and cybersecurity risk management.