Unshattering the mirror: Defragmenting the deep learning ecosystem
Who is this presentation for?
- Directors of AI, VPs of engineering, deep learning (DL) developers, ML engineers, and data scientists
Level
IntermediateDescription
Companies like Google, Facebook, Amazon, and Microsoft have comprehensive internal tooling that enables their developers to collaborate and build end-to-end AI applications. These platforms allow developers to be insanely productive and deliver new AI features and applications orders of magnitude faster than the rest of the industry. Meanwhile, developers at every other organization on the planet are left piecing this infrastructure together with a combination of highly specialized point solutions, legacy systems not designed for modern AI workflows, and half-baked open source projects. These tools lack standard interfaces and file formats, and they’re often incompatible in surprising ways.
Evan Sparks details the deficiencies with existing reference architectures for AI development infrastructure and the opportunities for end-to-end system design in AI development with deep dives into two examples: there’s orders of magnitude improvement in training performance and convergence to better models by integrating cluster resource management and fine-grained scheduling with hyperparameter optimization, and workload-aware checkpointing enables seamless and rapid fault tolerance, auto-scaling, and rapid collaboration.
Prerequisite knowledge
- A working knowledge of the model development process
- A high-level understanding of the existing tooling in the DL ecosystem for tools like model development, hyperparameter optimization, model compression, and training cluster management
What you'll learn
- Gain a better understanding of the gaps in current publicly available AI development infrastructure and a good sense of what you should demand from this infrastructure

Evan Sparks
Determined AI
Evan Sparks is a cofounder and CEO of Determined AI, a software company that makes machine learning engineers and data scientists fantastically more productive. Previously, Evan worked in quantitative finance and web intelligence. He holds a PhD in computer science from the University of California, Berkeley, where, as a member of the AMPLab, he contributed to the design and implementation of much of the large-scale machine learning ecosystem around Apache Spark, including MLlib and KeystoneML. He also holds an AB in computer science from Dartmouth College.
Presented by
Elite Sponsors
Strategic Sponsors
Diversity and Inclusion Sponsor
Impact Sponsors
Premier Exhibitor Plus
R & D and Innovation Track Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
Become a sponsor
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires