Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Magellan: Scalable and fast geospatial analytics

Ram Sriharsha (Databricks)

1:50pm–2:30pm Thursday, March 8, 2018

Big data and data science in the cloud, Data science and machine learning
Location: LL20 A

Average rating:

(4.75, 4 ratings)

View slides

Who is this presentation for?

Data scientists, geospatial engineers, and developers

Prerequisite knowledge

A basic understanding of geospatial analytics and Spark

What you'll learn

Explore Magellan, a geospatial optimization engine that seamlessly integrates with Spark
Learn how to extend Spark SQL to build a "big data compiler" for your frontend engine

Description

How do you scale geospatial analytics on big data? And while you’re at it, can you make it easy to use while achieving state-of-the-art performance on a single node? Ram Sriharsha offers an overview of Magellan—a geospatial optimization engine that seamlessly integrates with Spark—and explains how it provides scalability and performance without sacrificing simplicity.

By leveraging space-filling curves and indexing geometric shapes on the fly, Magellan is able to compute massive geospatial joins scalably while providing a level of abstraction to the end user that hides the complexities of indexing, join optimizations, etc. Magellan has also been benchmarked to be among the fastest geospatial engines even on a single node. Ram outlines the design considerations of Magellan, how it is able to achieve scalability for geospatial analytics without sacrificing simplicity and expressibility, how it can achieve blazingly fast single-node performance even with the usual framework overheads of Spark on a single node, and what’s next for the project.

Ram Sriharsha

Databricks

Ram Sriharsha is the product manager for Apache Spark at Databricks and an Apache Spark committer and PMC member. Previously, Ram was architect of Spark and data science at Hortonworks and principal research scientist at Yahoo Labs, where he worked on scalable machine learning and data science. He holds a PhD in theoretical physics from the University of Maryland and a BTech in electronics from the Indian Institute of Technology, Madras.

Comments on this page are now closed.

Comments

Art Covert | DATA ENGINEER

03/09/2018 2:28am PST

Thanks for the excellent talk! I learned a lot and I’m looking forward to getting a version of Magellean with the python API so I can play with it on our stack.

I was wondering if you had any good references for other implementations of z-order curves? We have some non-spark pieces of the stack that may also benefit from approximate matches with z-order curves.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com