Put AI to Work

April 15-18, 2019
New York, NY

Please log in

Add to Your Schedule

Open source tools for machine learning model and dataset versioning

Dmitry Petrov (Iterative AI), Ivan Shcheklein (Iterative AI)

4:55pm–5:35pm Thursday, April 18, 2019

Implementing AI
Location: Rendezvous

Secondary topics: Data and Data Networks

Average rating:

(4.67, 3 ratings)

Download slides (1-PDF)

Download slides (2-PDF)

Who is this presentation for?

Data scientists, data engineers, and managers

Level

Intermediate

Prerequisite knowledge

A basic understanding of machine learning and source code version control (Git, Mercurial, SVN, etc.)

What you'll learn

Explore best engineering practices in machine learning, particularly for ML model and dataset versioning

Description

Today, many companies are using machine learning, and ML teams are growing—along with the complexity of ML projects. Establishing a well-defined and manageable process has become a central issue in this environment. ML model and dataset versioning is an essential first step in the direction of establishing a good process.

Although source code versioning tools are mature, and the best software engineering practices are well defined, these tools and practices don’t fit well into the ML workflow. ML requires managing models and large dataset files and tightening them along with code for reproducibility where traditional tools like Git work poorly.

Dmitry Petrov and Ivan Shcheklein explore open source tools for ML models and datasets versioning, from traditional Git to tools like Git-LFS and Git-annex and the ML project-specific tool Data Version Control or DVC.org.

Dmitry Petrov

Iterative AI

Dmitry Petrov is a creator of the open-source tool Data Version Control (DVC.org), a building block for MLOps infrastructure.

Dmitry is a former data scientist at Microsoft with a Ph.D. in Computer Science. Today, he is based in San Francisco working on tools for machine learning and data versioning as a Co-Founder and CEO of Iterative.AI.

Website

Ivan Shcheklein

Iterative AI

Ivan Shcheklein is cofounder and CTO at Iterative AI, where he’s working on tools for data scientists. Previously, he was team lead for open source project Sedna.org and cofounded the Tweeted Times (acquired by Yandex in 2011). He holds an MS in CS.

Website

Presented by

Elite Sponsors

Strategic Sponsors

Contributing Sponsors

Business Summit Sponsor

Exabyte Sponsors

Diversity and Inclusion Sponsor

Impact Sponsors

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email aisponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of AI contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com