Modern large-scale cyber-defense systems are essentially based on data science and big data. However, addressing every aspect of data scientists’ versatile needs is not a trivial task. Cyber evidence and network forensics quickly scale to multipetabyte repositories constructed of trillions of tiny shreds of information. Moreover, in perhaps the most salient example of imbalanced data, malicious evidence accounts for less than one case in a million. Despite these complex entry barriers, an analytics infrastructure is required to demonstrate interactive response times for user queries, along with efficient batch operations. All these aspects must be achieved using an extremely low footprint, suitable for an on-premises solution.
Lee Blum offers an overview of Verint’s large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company’s extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. The system’s big data pipeline is based on Apache Spark and the Hadoop ecosystem. An important factor when creating the cyber-defense system, was to enable Verint’s data scientists to feel at home when developing algorithms, which the company achieved by incorporating a wide range of use cases and implementing methods familiar to data scientists.
Lee Blum is Verint’s Product Manager for Big Data Analytics in the Cyber Intelligence division. He is responsible for Big Data solutions on Large Scale Cyber systems, providing rapid ingestion, processing and advanced analytics of data, collected by high-end cyber probes. Lee has over 15 years of experience in IP networks, back-end development and petabyte-scale Big Data analytics.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org