Deep neural networks (DNNs) are extraordinarily versatile artificial intelligence models that require substantial computing resources in both training and deployment. By operationalizing trained DNNs on a cloud-based Hadoop ecosystem, data engineers can dynamically scale cluster size to achieve and maintain desired evaluation throughput rates for changing workloads.
Using a classification of aerial images use case, Mary Wahl demonstrates how DNNs created in popular deep learning frameworks, such as Microsoft’s Cognitive Toolkit (CNTK) and Google’s TensorFlow—can be deployed on Microsoft HDInsight Spark clusters to efficiently partition evaluation tasks across worker nodes and minimize data transfer latency from HDFS (Azure Data Lake Store).
Most deep learning frameworks offer built-in minibatching functionality, including associated methods for data deserialization and preprocessing. A user would be remiss not to take advantage of these efficient functions during training, but their requirements (loading input data from disk, proprietary file formatting) may be unacceptable when applying a trained model to new data. For example, web services or worker nodes on Hadoop ecosystem clusters should process input data directly without writing to disk. Users may therefore need to recreate for deployment the loading and preprocessing steps that their deep learning framework’s built-in methods performed during training. Mary covers the most insidious and common errors she has encountered with that process.
Mary Wahl is a data scientist on Microsoft’s AI for Earth team, which helps NGOs apply deep learning to problems in conservation biology and environmental science. Mary has also worked on computer vision and genomics projects as a member of Microsoft’s algorithms and data science solutions team in Boston. Previously, Mary studied recent human migration, disease risk estimation, and forensic reidentification using crowdsourced genomic and genealogical data as a Harvard College Fellow.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com