Skip to main content

Big Data Workflows on Mesos Clusters

Florian Leibert (Mesosphere), Paco Nathan (, Benjamin Hindman (Apache Mesos)
Hadoop and Beyond
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ****.
(4.40, 5 ratings)

Tutorial Prerequisites

Most of the code will be run on clusters on Amazon AWS. Participants will need to SSH into those clusters from their laptops, and will need to follow along with instructional material published on web pages.

The instructors will attempt to provide AWS resources for all of the participants, but if you prefer to use your own AWS credentials (and thus keep clusters running) that’s fine as well.

Tutorial Description

Apache Hadoop is rarely, if ever, used in isolation. Input data comes from other frameworks, and output results get consumed by other frameworks — typically, long-running services. Scheduling many frameworks to run on the same cluster (multi-tenant) resolves problems that might otherwise lead to high costs for Big Data apps: better utilization rates, lower latency for updates, less Ops overhead, etc.

Apache Mesos is an open source cluster manager that provides efficient resource isolation for distributed frameworks, based on “cgroups” in Linux. It is similar to Google’s “Borg” and “Omega” projects for warehouse scale computing, and has run in production at scale for over two years at Twitter. It is also a foundation for the emerging BDAS technology stack, representing a next generation beyond Hadoop.

This tutorial will provide hands-on experience in how to build scalable, fault-tolerant data workflows atop Mesos. Given the use of Mesos as the kernel for a data center OS, additional open source components Chronos (like Unix “cron”) and Marathon (like Unix “init.d”) serve as the building blocks for creating distributed, fault-tolerant, highly-available apps at scale.

We will work with Hadoop running on a Mesos cluster, using Chronos to orchestrate Hadoop jobs and other data preparation, then using Marathon to launch a Rails + Redis application to serve the results.

Some familiarity with shell commands on Linux is needed, and prior hands-on with running Hadoop apps would be helpful. Participants will be given access to cluster resources on Amazon AWS, and need to bring their own laptops for browser and SSH access.

Florian Leibert is the main author of Chronos and a contributor to Marathon. Paco Nathan is an O’Reilly author (“Enterprise Data Workflows with Cascading”) and an expert on Mesos and Cascading use cases.

Photo of Florian Leibert

Florian Leibert

Founder, Mesosphere

Flo was an early engineer at Twitter where he helped build critical infrastructure for doing analytics and search. He was primarily responsible for Twitter’s user search product. After a few years at Twitter, Flo joined Airbnb and built out the data infrastructure team. Flo is an early leader in the Apache Hadoop community and has extensive experience using it at scale. He has published a number of patents and papers in the area of information retrieval and large-scale distributed systems. Flo was also the driver and main author of Chronos, a fault tolerant job dependency scheduler and job orchestration framework built on top of mesos.

Photo of Paco Nathan

Paco Nathan

Evil Mad Scientist,

Paco Nathan is the Chief Scientist at Mesosphere in SF, and a “player/coach” who’s led innovative Data teams building large-scale apps for the past decade. He is a recognized expert in distributed systems, machine learning, predictive modeling, and cloud computing. He received his BS Math Sciences and MS Computer Science degrees from Stanford, and has 25+ years experience in the tech industry ranging from Bell Labs to early-stage start-ups.

Paco is an evangelist for the Mesos and Cascading open source projects, and is also an O’Reilly author for “Enterprise Data Workflows with Cascading”.

Photo of Benjamin Hindman

Benjamin Hindman

Co-Creator, Apache Mesos

Ben Hindman is one of the creators of Apache Mesos, a platform for building and running resource-efficient distributed systems at scale. Ben started working on Mesos as a PhD student at Berkeley before he brought it to Twitter where it runs on thousands of machines. An academic at heart, his research in programming languages and distributed systems has been published in leading academic conferences.