Skip to main content

When Workflows Attack: In the Trenches with Azkaban, LinkedIn's Open-Source Workflow Scheduler

Richard Park (SkipFlag)
Hadoop in Action Sutton Center - Sutton South
Average rating: ****.
(4.17, 6 ratings)

A vast array of LinkedIn’s data products is created on Hadoop: People You May Know, Endorsements, our A/B test platform, etc. Tens of thousands of Hadoop jobs need to execute reliably in order, on a set schedule every day by a multitude of teams. LinkedIn relies on Azkaban, an open-source workflow manager with a web-based UI, to meet these demands.

Several years since Hadoop and Azkaban’s modest beginnings at LinkedIn, our company has seen incredible growth. With that, our Hadoop user base has exploded, our workflows have increased in both number and complexity, and our data infrastructure has changed dramatically. With these challenges, Azkaban has had to evolve.

We need Azkaban to reliably serve critical production workflows; support all Hadoop versions; fit our security needs; and scale as clusters expand and workloads grow. Additionally, developers, data scientists and analysts want Azkaban to continue to be easy to use; compatible with various Hadoop query platforms such as Pig and Hive; and have a rich and growing set of features for scheduling, monitoring and visualizing their workflows.

In this talk, we’ll go through the war stories and lessons learned in supporting these workloads on clusters with over a thousand active users and how Azkaban has been redesigned over time to achieve our goals.

Richard Park

SkipFlag

Richard Park is a software engineer at LinkedIn and has been a member of their Hadoop Developer group since 2009. He has been an instrumental part of developing LinkedIn’s Hadoop infrastructure. He is the lead developer on Azkaban and has contributed to open-source projects including Apache Kafka. He has previously worked at PayPal in the fraud detection group.

Comments on this page are now closed.

Comments

Marek K Kolodziej
10/30/2013 4:35pm EDT

Would it be possible to post the slides here, like the other speakers have?

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts