Put AI to Work
April 15-18, 2019
New York, NY

Open source tools for machine learning model and dataset versioning

Dmitry Petrov (Iterative AI), Ivan Shcheklein (Iterative AI)
4:55pm5:35pm Thursday, April 18, 2019
Implementing AI
Location: Rendezvous
Secondary topics:  Data and Data Networks
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and managers



Prerequisite knowledge

  • A basic understanding of machine learning and source code version control (Git, Mercurial, SVN, etc.)

What you'll learn

  • Explore best engineering practices in machine learning, particularly for ML model and dataset versioning


Today, many companies are using machine learning, and ML teams are growing—along with the complexity of ML projects. Establishing a well-defined and manageable process has become a central issue in this environment. ML model and dataset versioning is an essential first step in the direction of establishing a good process.

Although source code versioning tools are mature, and the best software engineering practices are well defined, these tools and practices don’t fit well into the ML workflow. ML requires managing models and large dataset files and tightening them along with code for reproducibility where traditional tools like Git work poorly.

Dmitry Petrov and Ivan Shcheklein explore open source tools for ML models and datasets versioning, from traditional Git to tools like Git-LFS and Git-annex and the ML project-specific tool Data Version Control or DVC.org.

Photo of Dmitry Petrov

Dmitry Petrov

Iterative AI

Dmitry Petrov is cofounder and CEO at Iterative AI, where he’s working on tools for machine learning and data versioning. An ex-data scientist at Microsoft and an active open source contributor, Dmitry wrote and open-sourced the first version of the DVC.org project and implemented a wavelet-based image hashing algorithm (wHash) in open source library ImageHash for Python. He holds a PhD in computer science.

Photo of Ivan Shcheklein

Ivan Shcheklein

Iterative AI

Ivan Shcheklein is cofounder and CTO at Iterative AI, where he’s working on tools for data scientists. Previously, he was team lead for open source project Sedna.org and cofounded the Tweeted Times (acquired by Yandex in 2011). He holds an MS in CS.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)