Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Spark on Mesos

Dean Wampler (Lightbend)
2:55pm–3:35pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Average rating: ****.
(4.33, 6 ratings)
Slides:   external link

Apache Spark is an open-source computation platform for big data that supports both batch-mode (offline) data analysis, just like MapReduce, but also processing of event streams, embedded SQL queries, and other extensions.

While Spark is most often discussed as a replacement for MapReduce in Hadoop clusters, Spark is actually agnostic to the underlying infrastructure for clustering, so alternative deployments are possible.

Mesos offers a superset of resource management and scheduling services compared to YARN, making it a viable alternative. The advantages of Mesos include greater flexibility for non-Hadoop, clustered applications, and more fine-grained resource management. The disadvantages of Mesos include the ecosystem of other tools that require Hadoop, which you might need to use.

We’ll use several example applications to discuss pragmatic details for Spark on Mesos, including streaming, batch-mode, and interactive application deployment tuning, and integration with databases and distributed file systems. We’ll contrast Mesos vs. YARN performance characteristics, and we’ll describe the Myriad project that integrates YARN with Mesos for a hybrid solution. Finally, we’ll make recommendations on when to use Spark on Mesos, when to use Hadoop instead, and how to be successful in either case.

Photo of Dean Wampler

Dean Wampler


Dean Wampler is the vice president of fast data engineering at Lightbend, where he leads the creation of the Lightbend Fast Data Platform, a distribution of scalable, distributed stream processing tools including Spark, Flink, Kafka, and Akka, with machine learning and management tools. Dean is the author of Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly. He is a contributor to several open source projects. A frequent Strata speaker, he’s also the co-organizer of several conferences around the world and several user groups in Chicago.

Comments on this page are now closed.


Picture of Dean Wampler
Dean Wampler
10/05/2015 9:40am EDT

I gave the slides to the conference right before the talk on Thursday. I believe they will be posted soon with the video. I also have the slides here:

Sarah Walker
10/05/2015 7:52am EDT

Can we get the slides from the talk posted? Thanks!