Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Spark on Mesos

Dean Wampler (Lightbend)
14:35–15:15 Thursday, 7/05/2015
Hadoop & Beyond
Location: Buckingham Room - Palace Suite
Average rating: ****.
(4.70, 10 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

This is a talk for developers and architects. Familiarity with Hadoop will be assumed. No prior Spark or Mesos experience will be assumed.

Description

Spark is an open-source computation platform for big data that supports both batch-mode (“offline”) data analysis, just like MapReduce, but also processing of event streams, embedded SQL queries, and other extensions.

While Spark is most often discussed as a replacement for MapReduce in Hadoop clusters, Spark is actually agnostic to the underlying infrastructure for clustering, so alternative deployments are possible.

Mesos offers resource management and scheduling services comparable to YARN, making it a viable alternative. The advantages of Mesos include greater flexibility for non-Hadoop, clustered applications and more fine-grained resource management. The disadvantages of Mesos include the ecosystem of other tools that require Hadoop, which you might need to use.

We’ll use several example applications to discuss pragmatic details for Spark on Mesos, including streaming, batch-mode, and interactive application deployment tuning, and integration with databases and distributed file systems. We’ll contrast Mesos vs. YARN performance characteristics. Finally, we’ll make recommendations on when to use Spark on Mesos, when to stick with Hadoop, and how to be successful in either case.

Photo of Dean Wampler

Dean Wampler

Lightbend

Dean Wampler, Ph.D. is the architect for Big Data Products and Services for Typesafe. He builds scalable, distributed applications using Spark, Hadoop, Mesos, Scala, and the Typesafe Reactive Platform. He is the author of several books for O’Reilly on Scala, Hive, and Functional Programming. Dean is a contributor to several open source projects, and co-organizes several technology conferences and Chicago-based user groups.