Apache Spark is the most popular open source project in big data. While many users initially come to Spark for its performance, they stay for the expressiveness of the APIs and ease of use of the engine.
Reynold Xin looks back at the history of data systems, from filesystems, databases, and big data systems (e.g., MapReduce) to “small data” systems (e.g., R and Python), covering the pros and cons of each, the abstractions they provide, and the engines underneath. Reynold then shares lessons learned from this evolution, explains how Spark is developed, and offers a peek into the future of Spark.
Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.