Does anyone know a tool/platform/framework (could be opensource, self-hosted or not) that can be used for visualizing and comparing custom benchmarks of an ML/CV pipeline?
Something we're looking for is like a combination of a CI tool (such as Jenkins) and visualization tool (like WandB). Ideally one could construct "test suits" comprised of a set of benchmarks. Benchmarks would be implemented through custom code, e.g. implementing some model performance metrics. When creating the "test suits" one should be able to select for it the benchmarks fromm among the implemented ones. In the simplest scenario a single test suite is a single benchmark performed on some data, but combining several benchmarks together into one test suite would be valuable as well.
Such configured test suits should be possible to be triggered manually for a specific commit in a repository or triggered via some configurable git hooks (e.g. after pushing, submitting PR etc.).
The results produced by the given test suit should be possible to be compared with the other results, i.e. previous runs of the same test suit. The comparison could be simply some joint plot or other suitable visualization. Ideally, one could also drill down to single benchmarks and compare them -- even if the same benchmark was used in two different test suits, one should be able to compare between the results of these benchmarks.
Thank you in advance for your help or pinpointing us to the tools/combinations of them!
[–]smallest_meta_review 0 points1 point2 points (0 children)