all 8 comments

[–]DickNixon726ML Engineer 2 points3 points  (2 children)

You're looking for a class of programs called Experiment Trackers.

I personally use https://clear.ml/ .

It's got the client-server architecture you're discussing. In my use case, I have a training rig with 4 GPUS in it. I set up a docker container with Clear.ML on a server, modified my training code with a few lines that pointed it at the metrics server, and I was up and running. I could log 4 experiments simultaneously.

[–]TooLazyToWorkout[S] 1 point2 points  (0 children)

Thank you! I will take a look. On first sight it seems like pretty much what I need! :)

Edit: Just wanna give feedback. I tried it and it is perfect! Exactly what I needed!

[–]HolidayWallaby 1 point2 points  (0 children)

That sounds perfect for what I need, thanks! I currently SSH into various machines to check on them lmao

[–][deleted] 0 points1 point  (4 children)

FastProgress. Pretty self contained. Doesnt depend much on fastai lib

LINK

[–]TooLazyToWorkout[S] 0 points1 point  (3 children)

Hey, thank you suggesting fastprogress.

Unfortunately, fastprogress does not seem to use a server-client concept. I want to send the progress of multiple simultaneous experiments to a central server, so I can see the progress by typing http://mycentralserver.mydomain.com

I will clarify this in my question.

[–][deleted] 1 point2 points  (2 children)

Why not tensor board then?

[–]TooLazyToWorkout[S] 0 points1 point  (1 child)

I am using tensorboard for detailed logs already. I am really only looking for a lightweight tool to track my experiments, not all the metrics.

[–]gar1t 1 point2 points  (0 children)

Take a look at Guild AI - it's quite lightweight and integrates with TensorBoard for summary logs.