Apache Eagle is an open source monitoring solution to instantly identify access to sensitive data, recognize malicious activities, and take action. Eagle is built for real-time policy evaluation and real-time machine-learning detection using Kafka, Storm, and Spark infrastructure. Eagle audits access to HDFS files, Hive, and HBase tables in real time, enforces policies defined on sensitive data access and alerts or blocks users’ access to that sensitive data in real time. Eagle also creates user profiles based on the typical access behavior for HDFS and Hive and sends alerts when anomalous behavior is detected. Eagle can also import sensitive data information classified by external classification engines to help define its policies. Eagle uses Kafka to process more than 10 billion security events per day and generates actionable alerts within seconds. Eagle provides easy programming API and configuration for consuming any data source and also ingests high-volume Hadoop audit logs into Kafka by the Log4j appender or Logstash agent, which involves a lot of performance tuning in Kafka operation. To ensure minimum alert latency, Eagle rebalances Storm topology accordingly in real time to achieve maximum elasticity. Arun Karthick Manoharan, Edward Zhang, and Chaitali Gupta offer an overview of Eagle, explain how Eagle helps secure a Hadoop cluster using policy-based and machine-learning user-profile-based detection and alerting, and explore how Eagle is built with scalability and usability in mind.
Arun Karthick Manoharan is a senior product manager at eBay, where he is currently responsible for building data platforms. Prior to eBay, Arun was a product manager for IBM Data Explorer and a product manager at Vivisimo.
Edward Zhang is the core developer and architect of Apache Eagle. Edward has been developing various monitoring applications for big data systems at eBay for a few years now. He is very knowledgeable in distributed systems.
Chaitali Gupta is a senior software engineer on the Hadoop Platform team at eBay. Chaitali holds a PhD in computer science from SUNY Binghamton, where she worked as a research assistant at the SUNY Binghampton’s Grid Computing Research Laboratory. Her interests included query, semantic reasoning, and management of scientific metadata and web services in large-scale grid computing environments.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.