2015 is the year of data science and platformization for Apache Spark. With new high-level APIs (e.g., DataFrames, ML Pipelines, and R) and extension points, Spark is accessible to a wider set of users and can plug in a myriad of data sources, algorithms, and external packages. 2015 also marks the beginning of Project Tungsten, a major revamp of Spark’s execution engine to improve its robustness and performance.
In 2016, Spark will continue pushing the boundaries of these dimensions, making it more powerful and even easier to use. Reynold Xin outlines three trends for the immediate future, discussing the major efforts to address them and exploring their implications for Spark users. These trends include:
Reynold Xin is a cofounder and chief architect at Databricks as well as an Apache Spark PMC member and release manager for Spark’s 2.0 release. Prior to Databricks, Reynold was pursuing a PhD at the UC Berkeley AMPLab, where he worked on large-scale data processing.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.