Geospatial datasets and systems were introduced at Monsanto over a decade ago, and their significance and use has only increased over time. Moreover, the volume and variety of datasets that are geospatially tagged and collected is increasing exponentially. However, the systems in use today have struggled to keep up with the ever-increasing demand. To address this, the Monsanto Data Platform Architecture and Engineering team embarked on a journey to create a scalable geospatial platform in the cloud using only open source components. The result has been a fully scalable geospatial platform that is being utilized across the globe for processing of geospatial datasets for both visualization and analytics services.
Naghman Waheed and Martin Mendez-Costabel explain how Monsanto built this platform, focusing on the technical design and build of the entire system and covering the technical architecture, how and why the team chose certain open source components, and the lessons learned along the way. Naghman and Martin also highlight the value derived out of the new platform through examples of how the system is being used to provide analytics on top of large geospatial datasets.
The entire platform was designed with several key architecture and engineering principles in mind: it needed to use open source, be instantiated in AWS cloud, be easily scalable for both processing and storage needs, have automated monitoring and failure form recovery, and integrate with existing technologies such as API gateway and identity management. The platform also supports a pay-as-you-use model with spend visibility and accountability passed back the the user of the platform.
The platform was built using open source software, including CKAN as the data searching catalog, Geoserver as the geospatial processing engine, QGIS as the visualization tool, and S3, Amazon Elastic File System, PostGIS, and AWS ECS for data processing. The platform is fully integrated with AKAN and VDS (virtual directory service) and utilizes the OAuth2.0 security model.
Naghman Waheed is the data platforms lead at Bayer Crop Science, where he’s responsible for defining and establishing enterprise architecture and direction for data platforms. Naghman is an experienced IT professional with over 25 years of work devoted to the delivery of data solutions spanning numerous business functions, including supply chain, manufacturing, order to cash, finance, and procurement. Throughout his 20+ year career at Bayer, Naghman has held a variety of positions in the data space, ranging from designing several scale data warehouses to defining a data strategy for the company and leading various data teams. His broad range of experience includes managing global IT data projects, establishing enterprise information architecture functions, defining enterprise architecture for SAP systems, and creating numerous information delivery solutions. Naghman holds a BA in computer science from Knox College, a BS in electrical engineering from Washington University, an MS in electrical engineering and computer science from the University of Illinois, and an MBA and a master’s degree in information management, both from Washington University.
Martin Mendez-Costabel leads the geospatial data asset team for Monsanto’s Products and Engineering organization within the IT Department, where he drives the engineering and adoption of global geospatial data assets for the enterprise. He has more than 12 years of experience in the agricultural sector covering a wide range of precision agriculture-related roles, including data scientist and GIS manager for E&J Gallo Winery in California. Martin holds a BSc in agronomy from the National University of Uruguay and two viticulture degrees: an MSc from the University of California, Davis, and a PhD from the University of Adelaide in Australia.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.