Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Achieving real-time ingestion and analysis of security events through Kafka and Metron

Kevin Mao (Capital One)
4:20pm5:00pm Wednesday, March 15, 2017
Data engineering and architecture
Location: LL20 C Level: Intermediate
Secondary topics:  Data Platform, Financial services, Streaming
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data engineers

Prerequisite knowledge

  • Conceptual knowledge of Apache NiFi, Apache Kafka, and Apache Storm
  • A cursory understanding of Amazon AWS services

What you'll learn

  • Understand the tools needed to construct a data pipeline specifically for the purpose of collecting data related to information security
  • Explore a demo of transforming raw device data into organized, columnar, Hadoop-compatible formats (Avro, ORC, Parquet) that can be used to batch analysis

Description

Today’s enterprise architectures are often composed of a myriad of heterogeneous devices. Bring-your-own-device policies, vendor diversification, and the transition to the cloud all contribute to a sprawling infrastructure, the complexity and scale of which can only be addressed by using modern distributed data processing systems.

Kevin Mao outlines the system that Capital One has built to collect, clean, and analyze the security-related events occurring within its digital infrastructure. Raw data from each component is collected and preprocessed using Apache NiFi flows. This raw data is then written into an Apache Kafka cluster, which serves as the primary communications backbone of the platform. The raw data is parsed, cleaned, and enriched in real time via Apache Metron and Apache Storm and ingested into ElasticSearch, allowing operations teams to detect and monitor events as they occur. The refined data is also transformed into the Apache ORC data format and stored in Amazon S3, allowing data scientists to perform long-term, batch-based analysis.

Kevin discusses the challenges involved with architecting and implementing this system, such as data quality, performance tuning, and the impact of additional financial regulations relating to data governance, and shares the results of these efforts and the value that the data platform brings to Capital One.

Photo of Kevin Mao

Kevin Mao

Capital One

Kevin Mao is a senior data engineer at Capital One Financial Services currently working on the Cybersecurity Data Lake team within Capital One’s Enterprise Data Services organization. Kevin’s current work involves designing and developing tools to ingest and transform cybersecurity-related data streams from across the organization into datasets that are used by security analysts for detecting and forecasting cyberthreats. Kevin holds a BS in computer science from the University of Maryland, Baltimore County and an MS in computer science from George Mason University. In his free time, he enjoys hiking, running, climbing, and snowboarding.

Comments on this page are now closed.

Comments

Picture of Kevin Mao
Kevin Mao | SENIOR DATA ENGINEER
03/29/2017 5:03am PDT

Apologies, mistakenly linked SlideShare again. Here is the direct link.

Picture of Kevin Mao
Kevin Mao | SENIOR DATA ENGINEER
03/29/2017 5:01am PDT

You can also directly download the slide deck here.

Aditya Verma | SENIOR SOLUTIONS DELIVERY MANAGER
03/28/2017 12:40am PDT

Hello Kevin,

Not able to get the deck. Can you check the location?

Picture of Kevin Mao
Kevin Mao | SENIOR DATA ENGINEER
03/18/2017 11:14pm PDT

Hello, I’ve posted the deck on SlideShare here.

Mahesh Narayana | SR DIRECTOR, DATA ENGINEERING.
03/17/2017 3:17am PDT

Hi, I need presentation deck .

How can I download.

Thank you,
Mahesh.