October 28–31, 2019
Please log in

Modular convolution considered beneficial

Jack Chung (AMD), Chao Liu (AMD), Daniel Lowell (AMD)
4:10pm4:50pm Thursday, October 31, 2019
Location: Grand Ballroom C/D

Who is this presentation for?

  • Algorithm developers




miOpen contains performance-critical GPU kernels that drive machine learning workloads on the AMD ROCm platform. Jack Chung, Chao Liu, and Daniel Lowell explore how to make them into modular pieces so they can be easily tuned for various GPU hardware from AMD and closely knitted with graph compilers such as TensorFlow XLA. They show how various convolution algorithms are implemented on AMD hardware, how they’re decomposed into modular pieces, how they can be picked up and fused by XLA, and how they perform.

Prerequisite knowledge

  • A basic understanding of the math behind convolution

What you'll learn

  • Discover how TensorFlow uses ROCm and miOpen, what's inside a convolution algorithm, the common operations on different flavors of convolution algorithms, and how XLA uses device functions
Photo of Jack Chung

Jack Chung


Wen-Heng (Jack) Chung is a PMTS software development engineer at AMD, where he’s been working on the ROCm stack since its early inception. He has experience in compiler frontend, optimization passes, and run time for high-level languages. His focus has been TensorFlow XLA.

Photo of Chao Liu

Chao Liu


Chao Liu is a software developer at AMD, where he works on the open source high-performance deep learning library miOpen. His interests include the development of parallel algorithms and numerical methods for a variety of applications, including deep learning and physics based simulation. Previously, he developed techniques for computational fluid dynamics, finite element analysis, iterative solvers, and mesh generations on shared and distributed-memory machines.

Photo of Daniel Lowell

Daniel Lowell


Daniel Lowell is the team lead and software architect for miOpen, AMD’s deep learning GPU kernels library. Previously, he worked at AMD Research in the high-performance computing (HPC) arena, in compiler technology and reliability. His interests include deep learning, brain-machine interfaces, autocode generation, and HPC.

  • O'Reilly
  • TensorFlow
  • Google Cloud
  • IBM
  • Databricks
  • Tensor Networks
  • VMware
  • Amazon Web Services
  • One Convergence
  • Quantiphi
  • Lambda Labs
  • Tech Mahindra
  • cnvrg.io
  • Determined AI
  • Inferencery
  • Manceps, Inc.
  • PerceptiLabs
  • Valohai

Contact us


For conference registration information and customer service


For more information on community discounts and trade opportunities with O’Reilly conferences


For information on exhibiting or sponsoring a conference


For media/analyst press inquires