Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Magellan: Scalable and fast geospatial analytics

Ram Sriharsha (Databricks)
1:50pm2:30pm Thursday, March 8, 2018
Average rating: ****.
(4.75, 4 ratings)

Who is this presentation for?

  • Data scientists, geospatial engineers, and developers

Prerequisite knowledge

  • A basic understanding of geospatial analytics and Spark

What you'll learn

  • Explore Magellan, a geospatial optimization engine that seamlessly integrates with Spark
  • Learn how to extend Spark SQL to build a "big data compiler" for your frontend engine

Description

How do you scale geospatial analytics on big data? And while you’re at it, can you make it easy to use while achieving state-of-the-art performance on a single node? Ram Sriharsha offers an overview of Magellan—a geospatial optimization engine that seamlessly integrates with Spark—and explains how it provides scalability and performance without sacrificing simplicity.

By leveraging space-filling curves and indexing geometric shapes on the fly, Magellan is able to compute massive geospatial joins scalably while providing a level of abstraction to the end user that hides the complexities of indexing, join optimizations, etc. Magellan has also been benchmarked to be among the fastest geospatial engines even on a single node. Ram outlines the design considerations of Magellan, how it is able to achieve scalability for geospatial analytics without sacrificing simplicity and expressibility, how it can achieve blazingly fast single-node performance even with the usual framework overheads of Spark on a single node, and what’s next for the project.

Photo of Ram Sriharsha

Ram Sriharsha

Databricks

Ram Sriharsha is the product manager for Apache Spark at Databricks and an Apache Spark committer and PMC member. Previously, Ram was architect of Spark and data science at Hortonworks and principal research scientist at Yahoo Labs, where he worked on scalable machine learning and data science. He holds a PhD in theoretical physics from the University of Maryland and a BTech in electronics from the Indian Institute of Technology, Madras.

Comments on this page are now closed.

Comments

Art Covert | DATA ENGINEER
03/09/2018 2:28am PST

Thanks for the excellent talk! I learned a lot and I’m looking forward to getting a version of Magellean with the python API so I can play with it on our stack.

I was wondering if you had any good references for other implementations of z-order curves? We have some non-spark pieces of the stack that may also benefit from approximate matches with z-order curves.