Apache Spark is the most popular open source project in big data. While many users initially come to Spark for its performance, they stay for the expressiveness of the APIs and ease of use of the engine.
Herman van Hövell tot Westerflier looks back at the history of data systems, from filesystems, databases, and big data systems (e.g., MapReduce) to “small data” systems (e.g., R and Python), covering the pros and cons of each, the abstractions they provide, and the engines underneath. Reynold then shares lessons learned from this evolution, explains how Spark is developed, and offers a peek into the future of Spark.
Herman van Hövell tot Westerflier is a Spark committer working on Spark SQL at Databricks. Previously, Herman was a consultant working for clients in banking, manufacturing, and logistics. His interests include database systems, optimization, and simulation.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com