Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Apache Eagle: Secure Hadoop in real time

14:55–15:35 Friday, 3/06/2016
Location: Capital Suite 10/11 Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Prerequisite knowledge

Attendees should have a basic understanding of the Hadoop ecosystem.


Apache Eagle is an open source monitoring solution to instantly identify access to sensitive data, recognize malicious activities, and take action. Eagle is built for real-time policy evaluation and real-time machine-learning detection using Kafka, Storm, and Spark infrastructure. Eagle audits access to HDFS files, Hive, and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks users’ access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behavior for HDFS and Hive and sends alerts when anomalous behavior is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Eagle uses Kafka to process more than 10 billion security events per day and generates actionable alerts within seconds. Eagle provides easy programming API and configuration for consuming any data source and also ingests high-volume Hadoop audit logs into Kafka by the Log4j appender or Logstash agent, which involves a lot of performance tuning in Kafka operation. To ensure minimum alert latency, Eagle rebalances Storm topology accordingly in real time to achieve maximum elasticity. Arun Karthick Manoharan, Edward Zhang, and Chaitali Gupta offer an overview of Eagle, explain how Eagle helps secure a Hadoop cluster using policy-based and machine-learning user-profile-based detection and alerting, and explore how Eagle is built with scalability and usability in mind.

Topics include:

  • The essential components of Hadoop data security: access control, data classification, data encryption, and data activity monitoring
  • An introduction to Eagle’s data activity monitoring
  • Use cases for Hadoop security
  • Eagle’s internals: architecture, components, and scalable design
  • Eagle’s scalability: performance numbers and lessons learned from deployment at eBay
  • Policy management and extensibility of Eagle’s framework
  • Machine-learning models in Eagle to create user profiles
  • Demo of end-to-end use case and open source contribution
Photo of Arun Karthick Manoharan

Arun Karthick Manoharan


Arun Karthick Manoharan is a senior product manager at eBay, where he is currently responsible for building data platforms. Prior to eBay, Arun was a product manager for IBM Data Explorer and a product manager at Vivisimo.

Photo of Yong Zhang

Yong Zhang


Edward Zhang is the core developer and architect of Apache Eagle. Edward has been developing various monitoring applications for big data systems at eBay for a few years now. He is very knowledgeable in distributed systems.

Photo of Chaitali Gupta

Chaitali Gupta


Chaitali Gupta is a senior software engineer on the Hadoop Platform team at eBay. Chaitali holds a PhD in computer science from SUNY Binghamton, where she worked as a research assistant at the SUNY Binghampton’s Grid Computing Research Laboratory. Her interests included query, semantic reasoning, and management of scientific metadata and web services in large-scale grid computing environments.