Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY
Discover opportunities for applied AI
Organizations that successfully apply AI innovate and compete more effectively. How is AI transforming your business?
Be a part of the program—apply to speak by October 16.

Open source tools for machine learning models and data sets versioning

Dmitry Petrov (Iterative AI), Ivan Shcheklein (Iterative AI)
4:55pm5:35pm Thursday, April 18, 2019
Implementing AI
Location: Mercury Rotunda
Secondary topics:  Data and Data Networks

Who is this presentation for?

Data scientists, Data engineers, Managers

Level

Intermediate

Prerequisite knowledge

Basic understanding of machine learning. Basic of source code version control (Git, Mercurial, SVN, etc).

What you'll learn

The need in the best engineering practices in machine learning. The best practices in ML models and data sets versioning.

Description

Many companies are using machine learning today, ML teams size is growing and complexity of ML project is increasing. Establishing a well define and manageable process become a central issue in this environment. ML models and data set versioning is an essential first step in the direction of establishing a good process.

Source code versioning tools are mature today and the best software engineering practices are well defined. However, these tools and the practices do not fit well into ML workflow. ML requires managing models and large dataset files, tight them along with code for reproducibility where the traditional tools like Git do not work well.

We will discuss open source tools for ML models and datasets versioning starting with traditional Git, through tools like Git-LFS (git-lfs.github.com) and Git-annex (git-annex.branchable.com) to ML project specific tool Data Version Control or DVC.org.

Photo of Dmitry Petrov

Dmitry Petrov

Iterative AI

A data scientist from Silicon Valley with Ph.D. in Computer Science. Ex-data scientist at Microsoft.

Now Co-founder and CEO of Iterative AI startup in San Francisco. We create tools for machine learning and data versioning.

Photo of Ivan Shcheklein

Ivan Shcheklein

Iterative AI

MS in CS. Former team lead for open-source project sedna.org. Co-founded a company The Tweeted Times that was acquired by Yandex in 2011. Recently have been working on tools for data scientists at Iterative.ai as a CTO.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)