Data version control with DVC

Author Topic: Data version control with DVC  (Read 1272 times)

Offline s.arman

  • Sr. Member
  • ****
  • Posts: 260
  • Test
    • View Profile
Data version control with DVC
« on: April 21, 2019, 02:30:42 AM »
DataOps is very important in data science, and that my opinion is that data scientists should pay more attention to DataOps. It’s the less used feature in data science projects. At the moment we normally are versioning code (with something like Git), and more people and organizations are starting to version their models. But what about data?

I’ll cover in detail how to use Git with DVC with other tools for versioning almost everything that goes into a data science (and scientific) project in an upcoming article.

Recently, the DVC project creator Dmitry Petrov gave an interview to Tobias Macey at Podcast.__init__ a Python podcast. In this blog post, I provide a transcript of the interview. You might find interesting the ideas behind DVC and how Dmitry sees the future of data science and data engineering.

You can hear the podcast here:

Version Control For Machine Learning Projects

An interview with the creator of DVC about how it improves collaboration and reduces duplicate effort on data science…
www.pythonpodcast.com   
TL;DR
We need to pay more attention on how we organize our work. We need to pay more attention how we structure our project, where we need to find the places where we waste our time instead of doing actual work. And is very important to be more organized, more productive as a data scientist, because today, we are still on the Wild West.

The transcript
Disclaimer: This transcript is a result of listening to the podcast and writing what I heard. I used some software to help me in the transcription but most of the work is made by my ears and hands, so please if you can improve this transcription feel free to leave a comment below :)
Tobias:
Your host as usual as Tobias Macy and today I’m interviewing Dmitry Petrov about DVC, an open source version control system for machine learning projects. So Dmitry, could you start by introducing yourself?

for more : https://towardsdatascience.com/data-version-control-with-dvc-what-do-the-authors-have-to-say-3c3b10f27ee?fbclid=IwAR3LZTqwlXWubf132vbsESPsEn-r1zO4HfSzlciop1Msav2IRGnlcrZ3eG8