Many companies are using machine learning today, ML teams size is growing and complexity of ML project is increasing. Establishing a well define and manageable process become a central issue in this environment. ML models and data set versioning is an essential first step in the direction of establishing a good process.
Source code versioning tools are mature today and the best software engineering practices are well defined. However, these tools and the practices do not fit well into ML workflow. ML requires managing models and large dataset files, tight them along with code for reproducibility where the traditional tools like Git do not work well.
We will discuss open source tools for ML models and datasets versioning starting with traditional Git, through tools like Git-LFS (git-lfs.github.com) and Git-annex (git-annex.branchable.com) to ML project specific tool Data Version Control or DVC.org.
A data scientist from Silicon Valley with Ph.D. in Computer Science. Ex-data scientist at Microsoft.
Now Co-founder and CEO of Iterative AI startup in San Francisco. We create tools for machine learning and data versioning.
MS in CS. Former team lead for open-source project sedna.org. Co-founded a company The Tweeted Times that was acquired by Yandex in 2011. Recently have been working on tools for data scientists at Iterative.ai as a CTO.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com