Spark is an open-source computation platform for big data that supports both batch-mode (“offline”) data analysis, just like MapReduce, but also processing of event streams, embedded SQL queries, and other extensions.
While Spark is most often discussed as a replacement for MapReduce in Hadoop clusters, Spark is actually agnostic to the underlying infrastructure for clustering, so alternative deployments are possible.
Mesos offers resource management and scheduling services comparable to YARN, making it a viable alternative. The advantages of Mesos include greater flexibility for non-Hadoop, clustered applications and more fine-grained resource management. The disadvantages of Mesos include the ecosystem of other tools that require Hadoop, which you might need to use.
We’ll use several example applications to discuss pragmatic details for Spark on Mesos, including streaming, batch-mode, and interactive application deployment tuning, and integration with databases and distributed file systems. We’ll contrast Mesos vs. YARN performance characteristics. Finally, we’ll make recommendations on when to use Spark on Mesos, when to stick with Hadoop, and how to be successful in either case.
Dean Wampler, Ph.D. is the architect for Big Data Products and Services for Typesafe. He builds scalable, distributed applications using Spark, Hadoop, Mesos, Scala, and the Typesafe Reactive Platform. He is the author of several books for O’Reilly on Scala, Hive, and Functional Programming. Dean is a contributor to several open source projects, and co-organizes several technology conferences and Chicago-based user groups.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.