You’ve trained machine learning models on your data, but how do you put them into production? When you have tens of thousands of model versions, each written in any mix of frameworks from R, Java, and Ruby to scikit-learn, Caffe, and TensorFlow on GPUs and exposed as REST API endpoints, and your users love to chain algorithms and run ensembles in parallel, how do you maintain a latency of less than 20 ms on just a few servers?
As hot as AI has been lately, with constant advances in what is possible, there’s not been as much discussion of the infrastructure and scaling challenges that come with it. Algorithmia has built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework, and has faced and solved many of its challenges.
Diego Oppenheimer explains why machine learning is a natural fit for serverless computing, discusses issues he ran into when implementing on-demand scaling over GPU clusters at Algorithmia, and provides general solutions and a vision for the future of cloud-based ML. Diego then shares a complete operating system for AI—a common interface for different algorithms to be used and combined—and a general architecture for serverless machine learning that is discoverable, versioned, scalable, and sharable.
Diego Oppenheimer is the founder and CEO of Algorithmia. An entrepreneur and product developer with extensive background in all things data, Diego has designed, managed, and shipped some of Microsoft’s most used data analysis products, including Excel, Power Pivot, SQL Server, and Power BI. Diego holds a bachelor’s degree in information systems and a master’s degree in business intelligence and data analytics from Carnegie Mellon University.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com