October 28–31, 2019

Modular convolution considered beneficial

Jack Chung (AMD), Chao Liu (AMD), Daniel Lowell (AMD)
4:10pm4:50pm Thursday, October 31, 2019
Location: Grand Ballroom C/D

Who is this presentation for?

  • Algorithm developers

Level

Intermediate

Description

miOpen contains performance-critical GPU kernels that drive machine learning workloads on the AMD ROCm platform. Jack Chung, Chao Liu, and Daniel Lowell explore how to make them into modular pieces so they can be easily tuned for various GPU hardware from AMD and closely knitted with graph compilers such as TensorFlow XLA. They show how various convolution algorithms are implemented on AMD hardware, how they’re decomposed into modular pieces, how they can be picked up and fused by XLA, and how they perform.

Prerequisite knowledge

  • A basic understanding of the math behind convolution

What you'll learn

  • Discover how TensorFlow uses ROCm and miOpen, what's inside a convolution algorithm, the common operations on different flavors of convolution algorithms, and how XLA uses device functions
Photo of Jack Chung

Jack Chung

AMD

Wen-Heng (Jack) Chung is a PMTS software development engineer at AMD, where he’s been working on the ROCm stack since its early inception. He has experience in compiler frontend, optimization passes, and run time for high-level languages. His focus has been TensorFlow XLA.

Photo of Chao Liu

Chao Liu

AMD

Chao Liu is a software developer at AMD, where he works on the open source high-performance deep learning library miOpen. His interests include the development of parallel algorithms and numerical methods for a variety of applications, including deep learning and physics based simulation. Previously, he developed techniques for computational fluid dynamics, finite element analysis, iterative solvers, and mesh generations on shared and distributed-memory machines.

Photo of Daniel Lowell

Daniel Lowell

AMD

Daniel Lowell is the team lead and software architect for miOpen, AMD’s deep learning GPU kernels library. Previously, he worked at AMD Research in the high-performance computing (HPC) arena, in compiler technology and reliability. His interests include deep learning, brain-machine interfaces, autocode generation, and HPC.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

sponsorships@oreilly.com

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires