Put AI to Work
April 15-18, 2019
New York, NY
Please log in

A software accelerator for machine learning

Vinay Rao (RocketML), Santi Adavani (RocketML)
1:50pm2:30pm Wednesday, April 17, 2019
Models and Methods
Location: Grand Ballroom West
Secondary topics:  Models and Methods, Platforms and infrastructure
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Chief data officers, data scientists, and machine learning engineers



Prerequisite knowledge

  • Basic knowledge of machine learning terminology

What you'll learn

  • Discover why GPUs are not always required for accelerating machine learning training times
  • Understand how software optimizations can yield significant speedup
  • Learn how it's possible to achieve training speeds with CPU-only architectures


Over the past decade, the industry has invested heavily in hardware accelerators like GPUs and TPUs. While these investments are essential in solving the hard machine learning problems, they aren’t sufficient to close the gap created by increasing data sizes and the death of Moore’s law. In their Turing lecture, John Hennessy and Dave Paterson, two legends of computer science, claim that it’s indeed possible to speed up computations purely by software optimizations to the order of 61K times. However, in practice it’s a hard task. Even a 10x improvement would be a great win. RocketML has done exactly that.

Vinay Rao and Santi Adavani explain why software architectures will lead next generation of machine learning approaches. They take you through the history of distributed machine learning and hardware architectures, detailing the breakthroughs that have been made in software, and demonstrate how software-only approaches can ultimately scale better.

Vinay and Santi share a parallelized implementation of the L-BFGS algorithm on a distributed system that includes a cluster of commodity computing machines. RocketML uses the open source HPCC (high-performance computing cluster) Systems platform as the underlying distributed system to implement the L-BFGS algorithm. They offer an overview of the HPCC Systems framework and explain how it allows for the parallel and distributed computations important for big data analytics; they then detail their own implementation of the L-BFGS algorithm on this platform. Experimental results show that this large-scale implementation of the L-BFGS algorithm can easily scale from training models with millions of parameters to models with billions of parameters by simply increasing the number of commodity computational nodes.

Photo of Vinay Rao

Vinay Rao


Vinay Rao is the cofounder and CEO of RocketML, a machine learning platform on a mission to lead and enable transformation of the world toward artificial intelligence. RocketML implements bleeding-edge learning algorithms to perform at scale, delivering “near-real-time” training performance on any data size.

Photo of Santi Adavani

Santi Adavani


Santi Adavani is a cofounder at RocketML, where he and his team are building a superfast engine for building machine learning models. Previously, Santi was a product manager and software development lead in the Technology and Manufacturing Group at Intel. He holds a PhD in computational sciences from the University of Pennsylvania. His areas of expertise include high-performance computing, nonlinear optimization, partial differential equations, machine learning, and big data.