Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

Genji: A framework for building resilient near-real-time data pipelines

Swaminathan Sundaramurthy (Salesforce Inc), Mark Cho (Pinterest)
2:25pm3:05pm Wednesday, October 4, 2017

Who is this presentation for?

  • Data leaders, engineers, and engineering managers

Prerequisite knowledge

  • A basic understanding of data warehousing (useful but not required)

What you'll learn

  • Explore Pinterest's near-real-time data warehouses

Description

Pinterest operates on data at petabyte scale. Previously, the company’s fact tables were generated daily using Hadoop, resulting in data that was frequently 24–48 hours old. In order to support real-time decision making, stats, and analytics, Pinterest modeled its warehouse on quasi-Kappa architecture, treating batch processing as a special case of stream processing and warehousing data with sub-15-minute lag.

Swaminathan Sundaramurthy and Mark Cho offer an overview of Pinterest’s real-time data pipeline, discussing the company’s decision to warehouse data at near-real-time to enable downstream systems to operate on much fresher data, the platform’s architecture, and its impact on Pinterest’s systems, tools, and processes. They conclude by demonstrating how Pinterest models real-time ads analytics use cases on the platform and sharing lessons learned along the way.

Photo of Swaminathan Sundaramurthy

Swaminathan Sundaramurthy

Salesforce Inc

Swaminathan Sundaramurthy is a Director of Engineering at Salesforce Einstein, where he manages Machine Learning Services and Orchestration teams. Prior to Salesforce, Swami worked at Pinterest, where he initiated and managed the company’s stream platform and machine learning training platform, and managed anti-Spam and fraud efforts. He began his career as an IC, spending more than 12 years building distributed systems and cloud platforms at Amazon, Yahoo, Microsoft and Ask Jeeves. Swami is passionate about technology, distributed systems, promoting diversity and eliminating bias in the workplace.

Photo of Mark Cho

Mark Cho

Pinterest

Mark Cho is a software engineer at Pinterest.