Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x performance improvement to Qunar’s streaming processing

Xueyan Li (Qunar), Yupeng Fu (Alluxio)
12:0512:45 Thursday, 25 May 2017
Level: Beginner
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • Software engineers and data scientists

Prerequisite knowledge

  • Basic knowledge about streaming and distributed storage

What you'll learn

  • Learn about stream processing on Alluxio from real-world workloads at Qunar, as well as how to position Alluxio in the streaming architecture

Description

Alluxio—the first memory-speed virtual distributed storage system in the world—unifies the data from various under storage systems and presents a global namespace to various computation frameworks. Data access can be several magnitudes faster because of Alluxio’s memory-centric architecture. In addition, Alluxio’s tiered storage, unified namespace, flexible file API, web UI, and command-line tools increase the usability in different application scenarios. The Alluxio open source project is one of the fastest growing big data projects, with more than 600 contributors from more than 100 companies across the world.

Qunar is the number-one Chinese-language online travel information provider and search engine for web-based and mobile users. Currently, Qunar’s streaming platform processes around 6 billion system log entries (~4.5 TB) daily. Many jobs running on the platform are business critical and therefore impose strict requirements on both stability and low latency. For example, real-time user recommendations are generated mainly based on the log analysis of a user’s click behavior as well as the search pattern. The faster the iteration of the analysis, the more accurate the feedback that Qunar can deliver to the users. Therefore low latency and high stability are the top priorities of its system.

Xueyan Li and Yupeng Fu explore how Alluxio has led to performance improvements averaging a 300x improvement at service peak time on stream processing workloads at Qunar.

Photo of Xueyan Li

Xueyan Li

Qunar

Xueyan Li is a data platform R&D engineer at Qunar, where he is mainly responsible for the continuous integrated development of resource management system Mesos and distributed memory management system Alluxio, as well as data for all business lines based on public service support. Other focuses include the ELK log ETL platform, Spark, Storm, Flink, and Zeppelin. He holds a degree in software engineering from Heilongjiang University.

Photo of Yupeng  Fu

Yupeng Fu

Alluxio

Yupeng Fu is a software engineer at Alluxio and a PMC member of the Alluxio open source project. Previously, Yupeng worked at Palantir, where he led the efforts building the company’s storage solution. Yupeng holds a BS and an MS from Tsinghua University and has completed coursework toward a PhD at UCSD.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)