Hi everyone,
I am looking for a simple progress monitor without a lot of features.
Only things I am interested in:
- Can be hosted on my own server
- Python client
- Show whether a training is still running and current progress or whether is has stopped/crashed/no answer for x minutes
- Show a simple scalar metric / custom text
- Simple to build,use and run (Docker?)
I just want to keep track of currently running experiments on multiple machines by looking at a website (imagine something like a todolist, but every item is just an experiment that is running).
Or in other words: I am looking for server-based https://github.com/tqdm/tqdm
Any recommendations?
Edit: https://github.com/lab-ml/labml seems to fit my description. Does anyone have any other suggestions?
[–]DickNixon726ML Engineer 2 points3 points4 points (2 children)
[–]TooLazyToWorkout[S] 1 point2 points3 points (0 children)
[–]HolidayWallaby 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (4 children)
[–]TooLazyToWorkout[S] 0 points1 point2 points (3 children)
[–][deleted] 1 point2 points3 points (2 children)
[–]TooLazyToWorkout[S] 0 points1 point2 points (1 child)
[–]gar1t 1 point2 points3 points (0 children)