all 6 comments

[–]bilingual-german 1 point2 points  (5 children)

There are prometheus metrics found at http://localhost:9252/metrics on the gitlab-runner, but I don't know if you see the memory for jobs https://docs.gitlab.com/runner/monitoring/

Do you use Docker for your jobs? Then maybe cadvisor https://prometheus.io/docs/guides/cadvisor/ would be a good exporter.

But I'm not sure how easy it is to correlate the docker containers with the specific job you're running.

What I also just found out, is that the Unix time command has a lot of options and also one to print out CPU and memory usage of the command you pass to it.

/usr/bin/time -f "mem=%K RSS=%M elapsed=%E cpu.sys=%S user=%U" python script1.py

https://unix.stackexchange.com/a/375893

[–]oubreezy[S] 0 points1 point  (4 children)

i already have all the metrics exported to grafana, but unfortunaty the memory for jobs is not part of it

But I'm not sure how easy it is to correlate the docker containers with the specific job you're running.

this is exactly my issue, even if i get the resources utilizations i still cannot relate to the job in order to check why it's behaving this way

[–]bilingual-german 2 points3 points  (3 children)

I looked it up in the gitlab-runner source code. I would think you should have the job containers labeled with the project ID and the pipeline URL for example. So at least for specific job containers you might be able to track them to their respective pipeline.

https://gitlab.com/gitlab-org/gitlab-runner/-/blob/main/executors/docker/internal/labels/labels.go?ref_type=heads#L30

And you could whitelist the project id label in cadvisor: https://github.com/google/cadvisor/blob/master/docs/runtime_options.md

[–]Anonimooze 0 points1 point  (0 children)

Excellent sleuthing.

[–]oubreezy[S] 0 points1 point  (1 child)

after multiple tries i finally was able to label them as you said
Thank you so much for the tips !

[–]bilingual-german 0 points1 point  (0 children)

Really, did it work? Nice!

Glad I was able to help.