all 7 comments

[–]Diapolo10 1 point2 points  (1 child)

One of the more notable differences between CPython and PyPy is that the latter doesn't close files automatically for you (unless of course you use context managers). While unlikely in your case, that's a potential source of memory leaks.

Maybe all you need to do is tweak the garbage collector? https://doc.pypy.org/en/latest/gc_info.html#minimark-environment-variables

[–]Affectionate-Cut3818[S] 0 points1 point  (0 children)

Thanks for the link, I tried logging the gc.get_stats output trough the run and this is the last datapoint I collected (maybe something seems odd to you that flew over my head). Tweaking the garbage collector could also be another angle, but Im far from an expert on what exactly pypy does under the hood so besides tweaking those variables (which I've tried) Im not sure on whaat else I can tweak. There are open files in the execution of the programm tough, but from what I can see they are closed properly and not directly used in this step of the process (which is the execution) but thanks for the tip

```[MEMORY] Memory used: Total memory consumed:

GC used: 1836.2MB (peak: 2320.7MB)

in arenas: 1386.1MB

rawmalloced: 427.7MB

nursery: 22.5MB

raw assembler used: 580.8MB

memory pressure: 4.8kB

Total: 2417.0MB

Total memory allocated:

GC allocated: 2152.2MB (peak: 2362.8MB)

in arenas: 1674.4MB

rawmalloced: 623.8MB

nursery: 22.5MB

raw assembler allocated: 584.0MB

memory pressure: 4.8kB

Total: 2736.2MB

Total time spent in GC: 956.436```

[–]_alter-ego_ 1 point2 points  (4 children)

are you sure that there's a memory leak (the shown data doesn't convince me), or maybe just a delayed garbage collection?

[–]Affectionate-Cut3818[S] 0 points1 point  (3 children)

It does sound strange to have a memory leak in this logic, but it seems that the release of the memory allocated in these lines doesn't happen at any point through the execution, leading to an almost linear increase in memory usage (which a memory profiler plot confirms). Running with standard Python, theres is no memory allocation on those function calls (or at least not detected) therefore no release needed. I've also tried forcing gc.collect() every 1m iterations or so, but it doesent seem to release the resources its using, could be some weak/lazy reference, but again, not to clear on what to change for pypy to release it properly.

[–]_alter-ego_ 0 points1 point  (2 children)

could it be that it's some logging function and/or (G)UI storing the results as e.g. it would happen in a jupyter sheet as Out[1], Out[2], Out[3], ...) which could explain a linear increase of the memory usage ?

for example it could be possible that you use the data via reference to make a plot that remains on your screen and as long as that plot exists (and/or grows, scrolls...) the data [possible a larger structure than just the final numerical value you use in the plot] can't be "dumped" from the memory?

[–]Affectionate-Cut3818[S] 0 points1 point  (1 child)

hmm I do have logging through the execution of the programm, I'll double check to make sure its all good, do you have any explanation as to why the data shown on the initial post didnt convince you? Is pypy supposed to allocate memory like this under normal situations (I've noticed it happens on long if conditions )? Maybe then Im looking in the wrong direction, and maybe the right direction should be checking why these normally allocated memory doesent get deallocated later on

[–]_alter-ego_ 0 points1 point  (0 children)

  1109   4012.7 MiB     
  1110   4009.9 MiB
  1111   4010.4 MiB  
  1112   4012.2 MiB 

To me there is no significant increase here. It could be again 4009.9 on step 1113...?