Microsoft’s recently formed AI for Earth team helps NPOs apply AI to challenges in conservation biology and environmental science. Mary Wahl and Banibrata De highlight AI for Earth’s recent work with the Chesapeake Conservancy, a nonprofit organization charged with monitoring natural resources in the Chesapeake Bay watershed, a more than 165,000 square kilometer region in the eastern US extending from New York to Virginia. The Chesapeake Conservancy has meticulously assigned each square meter in this region a land cover label such as water, grass/herbaceous, tree/forest, or barren/impervious surface. The resulting high-quality, high-resolution land cover map is a valuable resource already used by environmental scientists and conservation efforts throughout the region. If similar maps could be generated at frequent timepoints, the data would allow researchers to quantify trends such as deforestation, urbanization, and the impacts of climate change. Unfortunately, the method used by Chesapeake Conservancy required substantial manual curation and postprocessing: the cost, time, and effort required severely limits how frequently the map can be updated.
In collaboration with the Chesapeake Conservancy and ESRI, AI for Earth’s team helped train a deep neural network model to predict land cover from a single high-resolution aerial imagery data source collected nationwide at frequent intervals. The team produced a neural network similar in architecture to Ronnenberger et al.’s U-Net, a commonly used semantic segmentation model type. After prototyping the training method in CNTK on a single-GPU Azure Geo AI Data Science Virtual Machine (DSVM) with a subset of available data, the team scaled up training to a 148-GPU cluster using Azure Batch AI. This transition allowed them to reduce the average duration of each training epoch 40-fold (near-linearity in scaling was lost with cluster sizes above 64 workers) and further reduced runtime by allowing the complete ~1 TB training dataset to be stored in memory rather than repeatedly accessed from disk. The trained model’s predictions are in excellent agreement with the conservancy’s land cover labels; many discrepancies are due to outdated information in the “ground truth” labels used during training.
Mary and Banibrata walk you through the end-to-end guide they published in early March describing their methods for training and applying this model. You’ll explore the use case and see the work in the context of the Azure resources in which it was developed. Along the way, Mary and Banibrata describe the source data and special challenges imposed by the very large file size and paired format, how the team handled common problems like label imbalance and partial missingness in training data, and the modifications required to enable distributed training in the script. They also highlight the speedups and other benefits achieved using data-parallel distributed training on a GPU cluster. They conclude by sharing how they surfaced this model for use in ArcGIS, a common geospatial information systems (GIS) software suite, so that its predictions can be shown in real time alongside aerial imagery as users scroll through a region of interest.
Mary Wahl is a data scientist on Microsoft’s AI for Earth team, which helps NGOs apply deep learning to problems in conservation biology and environmental science. Mary has also worked on computer vision and genomics projects as a member of Microsoft’s algorithms and data science solutions team in Boston. Previously, Mary studied recent human migration, disease risk estimation, and forensic reidentification using crowdsourced genomic and genealogical data as a Harvard College Fellow.
Banibrata De is a seasoned software engineer in Microsoft’s Algorithms and Data Science Group in Redmond, where he is the engineer for the Data Science Virtual Machine (DSVM) and works on a variety of solutions to help people democratize AI and ML using cutting-edge tools. Previously, he worked with the Windows Defender team to help protect Microsoft customers against various security vulnerabilities and was a performance software engineer for key Microsoft services and products helping to make the end-user experience enjoyable. He holds a degree in computer science from Jadavpur University, Kolkata, India.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org