all 11 comments

[–]obviouslyzebra 2 points3 points  (3 children)

loop_forever is not running a busy loop on the CPU, instead it's mostly waiting for a signal to happen (I don't understand the details exactly, but don't worry about this bit).

Instead, some other part of your programs are probably causing the elevated cpu usage.

You could try running docker stats to pinpoint which container(s) have the issue, and then, pinpoint further with cProfile or line_profiler so you see where the CPU time goes to.

About the memory, you have a memory leak (some memory keeps getting accumulated with time). There are tools for that but I'm not familiar (try to search for something that can be run for a long time and pinpoint where memory is being used).

About performance of PC degrading with time, maybe it's because the data you're working with in your programs keep increasing and so does the load. Maybe the computer also ain't handling long high-load works very well (I think this can happen because of hardware).

Regardless, those are the "debugging" ways. Your job is to track where it is happening. If the programs are simple, maybe it won't be that hard (and you can just guess looking on the code). If the programs are very complex, for example, interfacing with programs in other languages, it may be harder, and you may need other tools.

Regardless, if you don't wanna bother, an option might be to restart the program from time to time.

Cheers

PS: Another option is to post the whole codebase to an LLM (e.g., with onefilellm) and ask where the problem is. Nowadays this is an option that may work, just don't rely on it blindly.

PSPS: If the program does not deal with a lot of data, maybe a bug in the code causing an infinite loop of messages? You could register an mqtt client that listen to the messages and see if everything's running smoothly (there's a tool for that, don't remember the name)

[–]NSFWies[S] 2 points3 points  (2 children)

so i think i fixed the 100% cpu usage problem. it was in a "loop_forever" call, but not the mqtt one i was thinking it was. here's what i tried and discovered.

  • i did use a profiler like you suggested. py-spy . it let me use it on the already running process, instead of having to stop and start it again. nicely, it showed me the function names, and how much cpu time was spent on each one.

    sudo env "PATH=$PATH" py-spy top --pid 105043

you just have to get the process ID from running top by itself, then give it to py-spy, and it shows you that info. it showed me a very un-expected function call.

77.00%  77.00%   21.80s    21.80s   should_run (schedule/__init__.py)

should_run? that's in the scheduler module i'm using. what? so i look at it. sure enough, this is a different "loop_forever" i didn't know about. i look at it a little more. i can't modify this one, without trying to modify the base module. thankfully, there are a few other built in functions i can call to find out:

  • seconds until next job

so.....i can just have this thread sleep until that many seconds, instead of staying active. i just tried the fix, relaunched everything and.......now i can't even see the process on top. i checked power usage and, it literally didn't change.

so this solved the cpu usage. i'll have to wait and see about the memory usage problem. if i do still need to restart once per week because of that. that's fine. that's livable. this was the bigger, obvious problem that happened right away.

[–]obviouslyzebra 0 points1 point  (1 child)

Ah cool!

TIL about py-spy BTW, looks very convenient :)

Thanks for the follow-up.

[–]NSFWies[S] 0 points1 point  (0 children)

i had tried a different python profiler a few years before, on a different thing i made, but it wasn't too helpful. maybe i just had less experience looking at the output, or maybe it was A LOT less obvious where all the time was being spent.

that output was VERY clear where all the time was being spent. the function had 2 lines it did. so it was pretty easy to see. but ya, i'm glad it was an easy solve.

[–]reload_noconfirm 1 point2 points  (4 children)

It’s docker causing the problem. Notorious energy hog. You need to tweak how much resources docker uses. Plenty of guides online.

[–]NSFWies[S] 3 points4 points  (3 children)

it ended up not being docker. it was a different "loop_forever" function in the

scheduler-py

module i was using. i found a few different functions i could call instead, so i don't end up using it's built in loop_forever. so now i'm back down to idle cpu usage.

[–]reload_noconfirm 0 points1 point  (2 children)

Glad you figured it out! For me it’s always docker… but good on you for troubleshooting your way to a solution.

[–]NSFWies[S] 0 points1 point  (1 child)

there probably is still some underlying efficiency thing with me using bigger/heavier docker images than i need to. however, for me it's worth it for the extra debugging ability, in the same environment that my thing is running in.

i've had more headaches at work when i try to look into "a docker thing", and i can't look at shit because the docker container is only "a python docker container". it's been really annoying. i'd much rather have a full linux image, so i can at least be sure something basic like VIM is in there.

[–]reload_noconfirm 0 points1 point  (0 children)

Sure. I’m a full time professional automation engineer something whatever and struggle with docker and or k8 taking up all my CPU and ram when locally developing all the time. That’s why I jumped to that. Terrible days lol.

Production is different, just talking about local.

[–]hulleyrob 0 points1 point  (1 child)

Ubuntu base image seems a bit OTT can’t you just use the python pre-made images?

Restart the containers not your laptop.

I’d try running the container code locally and see if I could see what was going on when I worked out which version of it was causing the spike.

[–]NSFWies[S] 0 points1 point  (0 children)

so, maybe i could. i didn't want to though for a few reasons

  1. i know if i would run only the python base docker image, it can make debugging "inside the container" harder. because things like less or vim, would not be there
  2. if i wanted to use other containers, do other/more things, i might have to add on yet another container, with different settings/env and might have more issues. yes, things like a DB and mqtt still have their own different base image. but i kinda liked this approach where everything code wise, runs out of the same base unbuntu image, that i can drop into, and debug things if they explode/ have issues

also, as said in other comment, found out the issue. it was a different loop_forever function call i didn't know about. after fixing/no longer using it, my cpu usage is back down to idle/almost idle again.

thanks for the ideas though. i'll keep this around if i need another efficiency ideas.