Trecul : Data Flow Processing Using LLVM-based JIT Compilation on Top of Hadoop

Hadoop & Beyond, Grand West (NY Hilton)
Average rating: ****.
(4.00, 2 ratings)

Akamai’s Advertising team needed a data processing infrastructure to support our production workloads for which the existing solutions fell short. Existing frameworks such as Hive and Pig demonstrate excellent scalability, ease of use for flexible query development, and fault tolerance but are generally recognized to be slow; order of magnitude slow-downs relative to parallel databases are commonly documented.

After much evaluation, Akamai implemented Trecul, a system that runs inside Hadoop. It leverages LLVM to perform JIT-compilation on top of highly optimized standard data processing operators – no Java in tight loops, no interpreter involved in predicate evaluation, just straight native code executing at line speed nestled in the scale and fault tolerance we’ve come to know and love from Hadoop’s MapReduce execution model. On our standard workloads it has 10x the throughput of Hive.

Trecul is in production today, handling billions of events an hour, powering Akamai’s Advertising systems, including our attribution engine, machine learning based-modeling, and large scale reporting and insights.

Akamai has open-sourced Trecul on Github so that it may be used by others that wish to leverage Hadoop for analytical workloads in performance critical environments.

In this talk, we will walk through the use cases that lead us to write our own processing system, review the highlights of the implementation including why JIT-compilation via LLVM inside Hadoop is great for performance, show some performance benchmarks on real-world data at scale, and discuss how others might leverage this system for their own needs.

David Blair

Akamai Technologies

David Blair is a Principal Software Engineer at Akamai where he works on the Akamai Data Platform. He has been working with scalable data processing applications for over 10 years both at Akamai and in his previous roles of Director of Product Architecture at MetraTech, Inc. and Director of Engineering at Torrent Systems. He has a PhD in Mathematics from Brandeis University and B.S. in Mathematics from University of California, Berkeley.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.