Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

Lee Blum (Verint Systems)
14:5515:35 Wednesday, 23 May 2018

Who is this presentation for?

  • Big data architects, engineers, and managers

Prerequisite knowledge

  • A basic understanding of big data applications, Spark (batch and streaming), HDFS, Kafka, and NoSQL database concepts

What you'll learn

  • Explore Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records

Description

Modern large-scale cyber-defense systems are essentially based on data science and big data. However, addressing every aspect of data scientists’ versatile needs is not a trivial task. Cyber evidence and network forensics quickly scale to multipetabyte repositories constructed of trillions of tiny shreds of information. Moreover, in perhaps the most salient example of imbalanced data, malicious evidence accounts for less than one case in a million. Despite these complex entry barriers, an analytics infrastructure is required to demonstrate interactive response times for user queries, along with efficient batch operations. All these aspects must be achieved using an extremely low footprint, suitable for an on-premises solution.

Lee Blum offers an overview of Verint’s large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company’s extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. The system’s big data pipeline is based on Apache Spark and the Hadoop ecosystem. An important factor when creating the cyber-defense system, was to enable Verint’s data scientists to feel at home when developing algorithms, which the company achieved by incorporating a wide range of use cases and implementing methods familiar to data scientists.

Photo of Lee Blum

Lee Blum

Verint Systems

Lee Blum is Verint’s Product Manager for Big Data Analytics in the Cyber Intelligence division. He is responsible for Big Data solutions on Large Scale Cyber systems, providing rapid ingestion, processing and advanced analytics of data, collected by high-end cyber probes. Lee has over 15 years of experience in IP networks, back-end development and petabyte-scale Big Data analytics.