Apache Spark is an open-source computation platform for big data that supports both batch-mode (offline) data analysis, just like MapReduce, but also processing of event streams, embedded SQL queries, and other extensions.
While Spark is most often discussed as a replacement for MapReduce in Hadoop clusters, Spark is actually agnostic to the underlying infrastructure for clustering, so alternative deployments are possible.
Mesos offers a superset of resource management and scheduling services compared to YARN, making it a viable alternative. The advantages of Mesos include greater flexibility for non-Hadoop, clustered applications, and more fine-grained resource management. The disadvantages of Mesos include the ecosystem of other tools that require Hadoop, which you might need to use.
We’ll use several example applications to discuss pragmatic details for Spark on Mesos, including streaming, batch-mode, and interactive application deployment tuning, and integration with databases and distributed file systems. We’ll contrast Mesos vs. YARN performance characteristics, and we’ll describe the Myriad project that integrates YARN with Mesos for a hybrid solution. Finally, we’ll make recommendations on when to use Spark on Mesos, when to use Hadoop instead, and how to be successful in either case.
Dean Wampler is the vice president of fast data engineering at Lightbend, where he leads the creation of the Lightbend Fast Data Platform, a distribution of scalable, distributed stream processing tools including Spark, Flink, Kafka, and Akka, with machine learning and management tools. Dean is the author of Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly. He is a contributor to several open source projects. A frequent Strata speaker, he’s also the co-organizer of several conferences around the world and several user groups in Chicago.
Comments on this page are now closed.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.