Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Creating the Ultimate Data Scientist’s Cyber Playground: Building a Multi-Petabyte Analytic Infrastructure for Cyber Defense

Lee Blum (Verint Systems)
14:5515:35 Wednesday, 23 May 2018

Who is this presentation for?

Big Data Architects, Engineers and Managers

Prerequisite knowledge

Basic knowledge of: * Big Data applications * Spark (Batch and Streaming), HDFS, Kafka * NoSQL database concepts

What you'll learn

Learn how to approach complex Big Data use cases. Design a system that is both: * A scalable multi-petabyte system * Enables both simple actions and complex analytics * Has a very low footprint


Lee Blum will give a behind-the-scenes look at the building of Verint’s large scale Internet Service Provider Cyber Defense system. He will present its complex use case and explain how it was initially approached. In addition, he will reveal Verint’s secret sauce that enables it to work in petabyte scale in mere seconds of query latency.

Analyzing, Visualizing and Exploring Huge Amounts of Data

Modern large scale cyber defense systems are essentially based on Data Science and Big Data. However, addressing every aspect of a Data Scientists’ versatile needs, is in itself, not a trivial task.

Cyber evidence and network forensics quickly scale to multi-petabyte repositories, constructed of trillions of tiny shreds of information. Moreover, it is perhaps the most salient example of imbalanced data, processed by data scientists, with malicious evidence accounting for less than one in a million.

Despite these complex entry barriers, our analytics infrastructure is required to demonstrate interactive response times for user queries, as well as efficient batch operations. All these aspects must be achieved using an extremely low footprint, suitable for an on premise solution. The Big Data pipeline orchestrated for this purpose, is based on Apache Spark and the Hadoop eco-system.

An important factor when creating the Cyber Defense system, was to enable our Data Scientists to feel at home when developing algorithms. This was achieved by using a wide range of use cases, and by implementing methods to which they are familiar from work in the research group.

We will present this slim, but highly effective solution, together with stories of the challenges we met along the way and how we overcame them.

Photo of Lee Blum

Lee Blum

Verint Systems

Lee Blum – Common Technology Center, Verint Systems

Lee is a Big Data Architect in the Verint Common Technology Center. He is responsible for designing Big Data solutions on Large Scale Cyber Defense systems. In his role, Lee brings the latest Big Data technologies to provide rapid ingestion, processing and advanced analytics of data, collected by high-end cyber probes in Internet Service Provider networks. With over 15 years of experience in both network oriented, back-end development, Big Data architecture and analytics, Lee works with the Product Management, Research and the Engineering teams to support the realization and implementation of advanced algorithms and data analytics in Petabyte-scale data repositories.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)