Apache Spark 2.4 comes packed with a lot of new functionalities and improvements, including the new barrier execution mode, flexible streaming sink, the native AVRO data source, PySpark’s eager evaluation mode, Kubernetes support, higher-order functions, Scala 2.12 support, and more.
Xiao Li and Wenchen Fan offer an overview of the major features and enhancements in Apache Spark 2.4. Along the way, you’ll learn about the design and implementation of V2 of theData Source API and catalog federation in the upcoming Spark release. Then you’ll get the chance to ask all your burning Spark questions.
Xiao Li is a software engineer, Apache Spark committer, and PMC member at Databricks. His main interests are Spark SQL, data replication, and data integration. Previously, he was an IBM master inventor and an expert on asynchronous database replication and consistency verification. He holds a PhD from the University of Florida.
Wenchen Fan is a software engineer at Databricks, working on Spark Core and Spark SQL, as well as a Spark committer and a Spark PMC member. He mainly focuses on the Apache Spark open source community, leading the discussion and reviews of many features and fixes in Spark.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com