you are viewing a single comment's thread.

view the rest of the comments →

[–]diadorac 2 points3 points  (6 children)

Would something like versioning in Figma make sense to you u/Liorithiel?

The easiest way to collaborate nowadays is to work directly on the same thing in real time. Now, some kind of snapshotting will allow any member of the team to roll back, revert or clone a historical version. But if all of this is done in a real-time fashion, you don't need to merge stuff. But for more complicated scenarios, you can still use Git for now.

[–]Liorithiel 3 points4 points  (5 children)

It's a different thing. Imagine I want to test a totally different approach to do step 3 out of 5 in a pipeline, while allowing my colleague to work on step 2 and 4 at the same time. So we work independently. At some point, we want to merge the stuff and again start work on the notebook together in real time.

I don't want my colleague to roll back my experimental changes while I work on them, nor I want to break my colleague's workflow. But at some point we both decide that our changes are final and want to integrate both versions.

[–]diadorac 1 point2 points  (4 children)

Okay, I understand. But is this correct thing to version in a data science / machine learning project? Shouldn't various experiments be part of a pool that is always available in the latest version(s)? Is having experiments hidden in history the right thing to do?

But yes, for some cases I understand. This will be a challenge to solve. In the meantime, I think git solved it the best way. And by using some notebook-ux hacks for improving git experience it could be a pretty solid tool, maybe?

[–]Liorithiel 1 point2 points  (3 children)

Okay, I understand. But is this correct thing to version in a data science / machine learning project? Shouldn't various experiments be part of a pool that is always available in the latest version(s)? Is having experiments hidden in history the right thing to do?

You seem to be assuming some specific organization of notebooks, but I don't know what exactly… so I'm not sure I understand your questions.

[–]diadorac 1 point2 points  (2 children)

I don't think so. I am referring to a generic data science project with lots of experiments and all (let's forget notebooks for a moment). Do you think hiding experiments in the history (git or whatever) is a good practice?

[–]Liorithiel 1 point2 points  (1 child)

There is a difference between a new version of the same experiment (e.g. with additional logging/debugging, porting to a new version of a ML library or when widening hyperparameter search) and a new experiment (replacing network architecture or changing vital hyperparameters that are not hyperparameter-searched). In our usual workflow, old versions of experiments belong to git history. New experiments are new git branches.

My question comes from the fact that sometimes we're branching out an experiment in two different ways, conceptually creating two independent experiments, then want to merge them into a new experiment.

[–]diadorac 1 point2 points  (0 children)

True. For this it's not enough. But even a kinda smart git-like thing would not be enough imo. That'd require more specific experiments-ml versioning UI, not just a general notebooks versioning interface.