Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense

Lee Blum (Verint Systems)
14:5515:35 Wednesday, 23 May 2018

Who is this presentation for?

  • Big data architects, engineers, and managers

Prerequisite knowledge

  • A basic understanding of big data applications, Spark (batch and streaming), HDFS, Kafka, and NoSQL database concepts

What you'll learn

  • Explore Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records


Modern large-scale cyber-defense systems are essentially based on data science and big data. However, addressing every aspect of data scientists’ versatile needs is not a trivial task. Cyber evidence and network forensics quickly scale to multipetabyte repositories constructed of trillions of tiny shreds of information. Moreover, in perhaps the most salient example of imbalanced data, malicious evidence accounts for less than one case in a million. Despite these complex entry barriers, an analytics infrastructure is required to demonstrate interactive response times for user queries, along with efficient batch operations. All these aspects must be achieved using an extremely low footprint, suitable for an on-premises solution.

Lee Blum offers an overview of Verint’s large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company’s extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. The system’s big data pipeline is based on Apache Spark and the Hadoop ecosystem. An important factor when creating the cyber-defense system, was to enable Verint’s data scientists to feel at home when developing algorithms, which the company achieved by incorporating a wide range of use cases and implementing methods familiar to data scientists.

Photo of Lee Blum

Lee Blum

Verint Systems

Lee Blum is a big data architect at Verint’s Common Technology Center, where he is responsible for designing big data solutions on large-scale cyber-defense systems. In his role, Lee brings the latest big data technologies to provide rapid ingestion, processing, and advanced analytics of data, collected by high-end cyber probes in internet service provider networks and works with the product management, research, and engineering teams to support the implementation of advanced algorithms and data analytics in petabyte-scale data repositories. Lee has over 15 years of experience in network-oriented and backend development, big data architecture, and analytics.