Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid

Tony Ng (WeWork)
4:20pm–5:00pm Thursday, 03/31/2016
IoT and Real-time

Location: 210 C/G
Tags: real-time
Average rating: ****.
(4.11, 9 ratings)

Prerequisite knowledge

Attendees should have a basic knowledge of SQL, OLAP, event processing, and messaging systems.

Description

Enterprises are increasingly demanding real-time analytics and insights. Tony Ng offers an overview of Pulsar, an open source real-time streaming system used at eBay, which can scale to millions of events per second with 4GL SQL-like language support. Pulsar provides real-time sessionization, multidimensional metrics aggregation over time windows, and custom stream creation through data enrichment, filtering, and stateful processing. Tony explains how Pulsar integrates Kafka, Kylin, and Druid to provide flexibility and scalability in event and metrics consumption.

Topics include:

  • Real-time analytics and its applications, such as personalization, monitoring and marketing
  • Pulsar’s real-time analytics pipeline
  • Pulsar’s architecture to support high scalability and availability
  • Pulsar’s event-processing framework and language
  • Integration of Pulsar with Kafka to support replay of unprocessed or undelivered events to avoid data loss
  • Integration of Pulsar with Kylin to provide multidimension slice and dice of data
  • Integration of Pulsar with Druid to provide real-time metrics and dashboards
Photo of Tony Ng

Tony Ng

WeWork

Tony Ng is a Sr. Director of Engineering at WeWork, where he is responsible for WeWork’s Data Platform.