Spark is an open-source computation platform for Big Data. All the major Hadoop vendors have embraced Spark as a replacement for MapReduce, the venerable standard for writing Hadoop jobs. This talk explains why this change was necessary.
MapReduce has several major deficiencies that needed to be fixed:
We’ll see how Spark addresses all three concerns. It provides a high-level API that enables large MapReduce programs to be rewritten as small “scripts”. An integrated SQL query engine provides the best of both worlds, SQL-based queries for asking questions and a “Turing-complete”, general-purpose programming model for other chores. Spark has excellent performance, often 100x the performance of comparable MapReduce programs. Finally, Spark supports stream processing.
We’ll also see that the secret to Spark’s success is its roots in the Scala programming language and the world of Functional Programming, which together provide powerful, composable primitives that make it easier for developers to create a wide variety of high-performance applications.
We’ll demonstrate these points in the context of several example applications.
Dean Wampler is a Big Data Specialist for Typesafe. He builds scalable, distributed, “Big Data” applications using the Typesafe Reactive Platform, Spark, Hadoop, and other tools. He is the author of Programming Scala, Second Edition, the co-author of Programming Hive, and the author of Functional Programming for Java Developers, all from O’Reilly. Dean is a contributor to several open-source projects and he is the organizer of several Big Data and Scala user groups in Chicago. Dean can be found on twitter @deanwampler.
For exhibition and sponsorship opportunities, email stratahadoop@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
View a complete list of Strata + Hadoop World contacts
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • conf-webmaster@oreilly.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.