Specialized computing is increasingly popular for data center workloads that need high computational and I/O performance. This is particularly true of deep learning and machine learning, with their math libraries and primitives that are normally off-loaded to GPUs, FPGAs, and sometimes even to ASICs.
However, most accelerators available today are not scalable. They are natively isolated, relying upon higher-level frameworks (such as Apache Spark) running on CPUs for scalability. The few accelerators that can even be natively clustered (i.e., interconnected by a fabric without CPU intervention) are only topologically scalable to a limited degree. For instance, NVIDIA’s DGX-1 can go up to 8 GPUs meshed together with NVLink, a high-speed interconnect fabric. However, DGX-1 is a developer’s solution that often requires careful partitioning and manual tuning to fully exploit the clustering performance.
Bharadwaj Pudipeddi proposes a highly dense modular acceleration cluster completely disaggregated from generic servers in the data center that is specifically targeted for deep learning- and AI-related workloads. This cluster is scalable and lightweight (and devoid of Xeons) with the ability to run very deep neural networks through data and model parallelism for extreme performance. A low-level fabric minimizes data movement and supports scalability, resilience, and reconfigurability, and the software (or middleware) for accelerating a wide range of workloads is designed to seamlessly support multiple frameworks, including Caffe and TensorFlow, as well as execution frameworks such as Apache Spark.
Bharadwaj demonstrates how this modular approach accelerates the most demanding applications (including training) and how this architecture is suited for extremely deep neural networks by the virtue of avoiding unnecessary synchronization and centralized control, as would often be found in a traditional server CPU-controlled solution.
Bharadwaj Pudipeddi is the cofounder and CTO of NVXL, a company building a new clustered acceleration platform for deep learning, machine learning, and SQL workloads. A product entrepreneur and hardware architect, Bharadwaj previously worked at Intel and a number of startups in the areas of CPU design, high-performance fabrics, flash memory storage, and scalable computing.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com