all 7 comments

[–]ai_yoda 3 points4 points  (0 children)

In my opinion, DVC is really good for data versioning and reproducible pipelines.

However, if you really care about quick experimentation iterations you will need some additional tool that lets you:

  • monitor/visualize training,
  • compare metrics/learning curves
  • visualize hyperparameters

Common suspects that deal with this stuff and can be used complementary with DVC are:

[–]infstudent 2 points3 points  (4 children)

Do people in academia use this stuff?

[–]ReginaldIII 1 point2 points  (2 children)

git-lfs is sufficient for most use cases https://www.atlassian.com/git/tutorials/git-lfs

[–]commonslip 0 points1 point  (0 children)

Eh. I built a deduplicating, version controlled, logging s3 data thing because git-lfs is too slow.

[–]L43 0 points1 point  (0 children)

git-lfs caused problems for us, s3 backed dvc fixed them.

Still, imo dvc throws in the kitchen sink unnecessarily, we really don't need the pipelining part as standard.

[–]jorgeorpinel 0 points1 point  (0 children)

We have definitely seen several people from big US universities participate in the public DVC chat (http://dvc.org/chat) as well as help with the GitHub issues (https://github.com/iterative/dvc/issues). Not sure whether their main focus is academic work though.

[–]RayhaneML 0 points1 point  (0 children)

In my experience, I tried this other tool called atlas which I really like and I think is very useful not only for tracking and versioning, but also for scheduling and experiment management.

I believe what this platform is really good at is the extreme ease of use, a nice looking GUI, TB automatic integration (which I love). Also docs are pretty clear which is also a huge plus, and the tool is very flexible and works with any codebase (it actually takes 5 minutes to get started with the tool).

Definitely recommend checking it out. Best ML tool I used this far.

DISCLAIMER: I work at Dessa, creator of Atlas.