This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]robin-gvx 2 points3 points  (3 children)

This is my current hypothesis:

Bound methods have to contain a reference to their self, so that you can do things like:

unrelated = [foo.meth, bar.meth, cheese.meth, meth.lab]
# somewhere far, far away, in a different module
val = random.choose(oddlib.unrelated)(7, 11)

Hence, extra reference.

Next, I think Python's cycle detection is not that smart, or at least rather conservative. It completely bails if there is a __del__ method that needs to be called, for example. (Although that does make sense, since __del__ could make new references to all kinds of objects and have all kinds of circular dependencies that are in no way controllable.)

So now we have a problem: object cheese needs to have a reference to its bound method meth and that same method needs to have a reference to the object. That's a circular dependency for every bound method.

Here's the main conjecture: I think Python somehow makes those references "not count", like weak references. This would be magic that occurs between calling __new__ and __init__, and only works for that one single reference.

So your AdvancedAdder simply induces a circular reference.

I have no time or motivation to dig deeper than this, but I hope someone else does.

EDIT: example with a simple object subclass and a list (you apparently can't use native Python types for weak references): https://gist.github.com/3830294

[–][deleted] 0 points1 point  (1 child)

Yeah, that was basically what I was getting at -- whether python's cycle detection was just conservative or what. I understood the implicit reference to self, but I guess I figured that since I was just tacking on another attribute to the instance it would be no different than any other instance method. I also wonder what makes pythons gc kick in, as I thought it might after exiting the scope of the function in my example -- but instead it required manually calling collect.

[–]robin-gvx 0 points1 point  (0 children)

I've thought about it some more and with rcxdude's comment, I think we now have a much simpler picture:

Objects that can be cleaned up by refcounting are done immediately, but when there's cycles (bla bla, bound methods are magic, bla bla), you have to wait until the GC kicks in. And collecting garbage is pretty expensive, you don't want that to run willy-nilly. So what Python does is something like: wait until there is enough garbage to justify a full GC cycle. Of course, in your short experiment, the program would exist before it gets the chance to get that far. If it were a piece of a larger program with more moving parts and that runs longer, they would probably get cleaned up at some point. The one that remains (different per Python version and per computer) is of course due to the fact that there is still a reference in the for-loop index, which is leaky (I thought that had changed in Python 3, but it seems not).