Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Legacy or Kafka? What an ideal messaging system should bring to Hadoop

Jim Scott (NVIDIA)
17:25–18:05 Thursday, 2/06/2016
IoT & real-time
Location: Capital Suite 14 Level: Intermediate
Average rating: ***..
(3.43, 7 ratings)

Prerequisite knowledge

Attendees should have basic knowledge of software architecture and components.


Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.

Messaging solutions have existed for a long time now, but in the age of Hadoop, newer solutions like Kafka are being introduced that have higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than the legacy systems and have vastly different architectures. In light of this, can distributed messaging systems like Kafka be considered replacements for legacy technologies?

Jim Scott answers that question by delving into the architectural details and trade-offs of both legacy and new messaging solutions.

Topics include:

  • Queues versus logs
  • Security issues such as authentication, authorization, and encryption
  • Scalability and performance
  • Handling applications that span multiple data centers
  • Multitenancy considerations
  • APIs, integration points, and more
Photo of Jim Scott

Jim Scott


Jim Scott is the head of developer relations, data science, at NVIDIA. He’s passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).