Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Hadoop under attack: Securing data in a banking domain

Federico Leven (ReactoData)
12:0512:45 Wednesday, 23 May 2018
Secondary topics:  Security and Privacy
Average rating: **...
(2.67, 3 ratings)

Who is this presentation for?

  • Senior-level leaders (CTOs, CDOs, managers, etc.)

Prerequisite knowledge

  • A basic understanding of Hadoop

What you'll learn

  • Explore an end-to-end security deployment on Hadoop
  • Understand the challenges with meeting current regulations in a Hadoop environment
  • Discover the main technologies and components for authentication, authorization, auditing, and encryption


The apparent difficulty of managing Hadoop compared to more traditional and proprietary data products makes some companies wary of the Hadoop ecosystem, but managing security is becoming more accessible in the Hadoop space, with projects like Ranger, Knox, Sentry, and Metron providing enterprise-ready security features. However, the overall lack of built-in security threatens to hamper the open source platform’s spread in the banking and financial market before it’s really gotten off the ground.

If you set out to build a big data platform for a financial institution today, data security would be one of your top priorities. Every week seems to bring news of yet another massive data breach. In the last two years, big names like Target, Yahoo, and JPMorgan Chase, among others, have collectively lost hundreds of millions of customer records, including names, addresses, credit card numbers, and Social Security numbers. New regulations and legislation with strict directives like the EU’s General Data Protection Regulation (GDPR) make data security a main concern in the enterprise.

Federico Leven offers an overview of an end-to-end security deployment on Hadoop and the data and security governance policies implemented.

Topics include:

  • Active Directory integration for users and services
  • Data-in-transit encryption
  • HDFS data encryption
  • Hive authorization via Sentry
  • HDFS authorization
  • Cloudera Manager and HUE LDAP integration
  • Edge nodes and Flume modes securitization
  • Sensitive data redaction
Photo of Federico Leven

Federico Leven


Federico Leven is the founder and lead data architect at ReactoData, a startup located in Buenos Aires, Argentina, and Wroclaw, Poland, focused on big data, advanced analytics applications and Hadoop. He also participates in the Open Compute Project doing benchmarks of big data frameworks, coordinates the big data meetups at IAAR, teaches a hands-on Hadoop lab at the Universidad De Palermo in Argentina, and is a frequent speaker at big data conferences (when he has a good idea to share). He began working with Hadoop in 2012 at Luminar Insights; previously, he was a data warehouse architect and Python developer.