Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY
Discover opportunities for applied AI
Organizations that successfully apply AI innovate and compete more effectively. How is AI transforming your business?
Be a part of the program—apply to speak by October 16.

Software Accelerator for Machine Learning

vinay rao (RocketML Inc), Santi Adavani (Mr)
1:50pm2:30pm Wednesday, April 17, 2019
Models and Methods
Location: Grand Ballroom West
Secondary topics:  Models and Methods, Platforms and infrastructure

Who is this presentation for?

Chief Data Offers, Data Scientists, Machine Learning Engineers



Prerequisite knowledge

Basic knowledge of Machine learning terminology

What you'll learn

- GPUs are not always required for accelerating machine learning training times - Software optimizations can yield significant speed up - It is possible to achieve training speeds with CPU only architectures


Over the past decade, the industry has invested heavily in hardware accelerators like GPU and TPU. While these investments are essential in solving the hard machine learning problems, they are not sufficient to close the gap created by increasing data sizes and death of Moore’s law. Two legends of Computer Science, John Hennessy, and Dave Paterson, in their Turing lecture claim that it is indeed possible to speed up computations purely by software optimizations to the order of 61k times. However, in practice is a hard task. Even a 10x improvement would be a great win. RocketML has done exactly that.

Vinay Rao, CEO and Santi Adavani, CTO of RocketML explains why software architectures will lead next generation of machine learning approaches. The speaker takes you through the history of both distributed machine learning and hardware architectures, details the breakthroughs that have been made in software, and demonstrates how software ONLY approaches can ultimately scale better.

In this talk, we present a parallelized implementation of the L-BFGS algorithm on a distributed system which includes a cluster of commodity computing machines. We use open source HPCC Systems (High-Performance Computing Cluster) platform as the underlying distributed system to implement the L-BFGS algorithm. We initially provide an overview of the HPCC Systems framework and how it allows for the parallel and distributed computations important for Big Data analytics and, subsequently, we explain our implementation of the L-BFGS algorithm on this platform. Our experimental results show that our large-scale implementation of the L-BFGS algorithm can easily scale from training models with millions of parameters to models with billions of parameters by simply increasing the number of commodity computational nodes.

Photo of vinay rao

vinay rao

RocketML Inc

Founder and CEO of RocketML, a machine learning platform. We are on a mission to lead and enable transformation of the world towards Artificial Intelligence (AI)

Photo of Santi Adavani

Santi Adavani


Santi co-founded RocketML, where his team is building a superfast engine for building machine learning models. Before that, Santi worked as a Product manager and software development lead at Intel’s technology and manufacturing group. Prior to Intel, he got his Ph.D. in computational sciences from the University of Pennsylvania. His areas of expertise include high-performance computing, non-linear optimization, partial differential equations, machine learning, and big data.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)