There are many good approaches for designing a new deep learning or AI cluster. If the algorithms you are using are composed entirely of calls to standard AI libraries, then it’s easy to design a few architectures, try them in the cloud, and pick the best design for your needs. However, if there are no supporting libraries, the task of creating two or three prototype architectures can be dauntingly cost and time prohibitive.
Art Popp walks you through a “from scratch" implementation of two algorithms to demonstrate the tools available for original algorithm development, using both SIMD and SIMT designs, the leading hardware architectures of which are Xeon Phi and NVIDIA Cuda. Along the way, Art explores the performance per watt, performance per dollar (initial cost), and performance per dollar (TCO) of each. Each computation camp has its merits. Art’s goal is to give you a peek down each fork in the road and a plan as to how to determine the best direction with the least amount of wasted effort.
Art Popp is the senior hardware test engineer at ServiceNow. Previously, Art spent 25 years in the telecommunications industry, the last eight as the principal architect of a large telco carrier’s engineering data warehouse ecosystem, which grew to a mixed environment with 4 PB of IBM TwinFins (formerly Netezzas), 20+ racks of Hadoop gear, and dozens of racks of reporting and presentation systems. (Over time the focus of this environment shifted from reporting to predicting, which is where it got fun.)
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com