Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Introducing Venice: A derived datastore for batch, streaming, and lambda architectures

Felix GV (LinkedIn), Yan Yan (LinkedIn)
2:55pm3:35pm Thursday, September 28, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Average rating: **...
(2.00, 1 rating)

Who is this presentation for?

  • Engineers

Prerequisite knowledge

  • Basic knowledge of Hadoop, Kafka, and key-value stores (useful but not required)

What you'll learn

  • Explore Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency

Description

Companies with batch and stream processing pipelines need to serve the insights they glean back to their users, an often-overlooked problem that can be hard to achieve reliably and at scale. Felix GV and Yan Yan offer an overview of Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency. (LinkedIn runs Venice as a multitenant, self-service, globally replicated system.)

Venice was designed to be the next-generation replacement of the Voldemort Read-Only system, with the intent to provide a broader feature set, better availability characteristics, and a more efficient architecture. Venice is designed for high-throughput ingestion from Hadoop and Kafka, and these data sources can be merged at ingestion time in order to provide semantics similar to those of a lambda architecture but with a simpler, faster, and more available read path. Robustness is a primary architectural concern and, as such, Venice provides highly available reads and writes, self-healing, stringent data validation guarantees, and the ability to roll back entire datasets in cases where bad data is pushed.

Photo of Felix GV

Felix GV

LinkedIn

Felix GV is a software engineer working on LinkedIn’s data infrastructure. He works on Voldemort and Venice and keeps a close eye on Hadoop, Kafka, Samza, Azkaban, and other systems.

Photo of Yan Yan

Yan Yan

LinkedIn

Yan Yan is an engineer at LinkedIn, where he works on the Voldemort and Venice team within the company’s data infrastructure organization. He has extensive experience working on cluster management, Zookeeper, Helix, and distributed systems in general.