Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Architecting Data Platforms for Cyber Security

Charaka Goonatilake (Panaseer)
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Security and Privacy

Who is this presentation for?

Architects, Data & Platform Engineers and Data Scientists focused on the tools needed to effectively apply data science to cyber security

Prerequisite knowledge

• A general awareness of big data technologies such as Spark and the Hadoop ecosystem • Security domain knowledge is not required

What you'll learn

• Learn about key challenges in data-driven cyber security • Understand how various components from the Hadoop and big data ecosystem can be integrated to create a data platform to support cyber security use cases • Gain insights from real-world use cases to apply when designing your own data platforms


In today’s world, with cyber incidents reported almost daily, security teams in many organisations are increasingly turning to data for answers. Data-driven approaches can prove immensely valuable in providing visibility to support decision-making and drive action across the entire cyber security lifecycle. Data platforms to support security operations teams in their reactive efforts to detect and respond to security incidents have existed for a long time, from SIEMs historically and Apache Metron and Spot more recently. On top of these, a whole new class of traditionally underserved stakeholders and use cases is emerging from security executives who need strategic decision support to deliver proactive initiatives that measure and mitigate cyber risks.

Designing successful data solutions for the cyber security domain can be a daunting task. Firstly, the diversity of problems to be solved for various stakeholders in and around a security function leads to an array of complex and potentially competing data and analysis requirements.

The complexity arises initially from the need to collect and prepare data of any type from wherever it resides and however it’s exposed. Then the data must be stored in a way that can accommodate a range of access patterns. Finally, the interfaces must exist to promote wide accessibility to allow the range of platform users to analyse the data and consume insights taking their varying data analysis skill-levels into account.

Once you’ve understood the users and their needs, you face the challenge of navigating the vast sea of data technologies vying for your attention to arrive at a solution. With the proliferation of open-source and proprietary technology options, each with their own trade-offs, how do you deliver a scalable and flexible data platform that will serve your security organisation for years to come?

In this session, we’ll explore the key drivers that influence the architecture of a cyber data platform and how to deliver on these requirements with open-source big data technologies such as Spark and the Hadoop ecosystem. We’ll walk through real-life lessons learned, both successes and failures, with examples based on the experience of building and evolving data platforms.

Topics covered:
• Understanding the requirements – Common security user personas and their data use cases
• Distilling the requirements – The data and workloads that drive the architecture and technology choices
• Assembling the Data Platform from Core Building Blocks to Collect, Store, Analyse and Present data
• Data Collection – Rapidly on-board new datasets, without having to rely on data engineers writing code, using GUI-driven data flows in NiFi
• Data Storage – Understanding trade-offs in data stores, such as Hive & HBase, to service diverse data access patterns
• Data Analysis – Using Spark as a unifying framework so that data scientists can build data pipelines for ETL and analysis on security data
• Data Presentation – Integrating the toolset to the data platform that will allow your user-base, from developers to consumers, to extract and disseminate value from the data

Photo of Charaka Goonatilake

Charaka Goonatilake


Charaka Goonatilake is CTO at Panaseer where he has designed and delivered big data solutions for Chief Information Security Officers and their teams to gain visibility into the true state of security within their business in order to improve cyber hygiene and reduce cyber risk exposure. He has been immersed in big data technologies since the very early days of Hadoop, giving him hands-on experience of making Hadoop work in the enterprise to produce data-driven insights. Over the past 8 years, across Panaseer and BAE Systems Applied Intelligence, Charaka has architected and engineered Hadoop-based data platforms for a range of cyber security use cases from security analytics for threat detection, threat intelligence management and cyber security risk management.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)