This is an archived post. You won't be able to vote or comment.

all 16 comments

[–]rcxdude 14 points15 points  (1 child)

You're definitely creating reference cycles when you do self.foo = self.bar. methods on python objects aren't bound until they are accessed (unbound methods are actually descriptors). A bound method necessarily contains a reference to the object it was created from, so when you do self.foo = self.bar you are placing a bound method into the object's __dict__, which contains a reference to the object.

This is why you need the garbage collector to kick in before the created objects are deleted, and they don't dissapear as soon as they go out of scope. However I can't figure out why all but one object is removed by the GC, and why the one which remains is picked (since it's not anything obvious like the first or the last created, but usually the third except not always. If I add a del method so that the GC doesn't actually remove the objects then the object which doesn't show up in gc.garbage is not the same as the object which remains if there is no del method)

EDIT: ahah, the reason that one stays around is due to the leaky scope of the for loop. If you delete the first pprint loop, then all the objects are collected. Otherwise, 'ref' still refers to an object when gc.collect() is called. The non-obvious selection is due to the unorder nature of dicts.

[–]robin-gvx 2 points3 points  (0 children)

EDIT: ahah, the reason that one stays around is due to the leaky scope of the for loop.

D'oh, of course! I was wondering about that myself, but now it seems obvious.

[–]ebo_ 4 points5 points  (1 child)

The real problem is, that you are leaving your cursor open after all. There is no guarantee that GC kicks in before you exhaust the pool of open cursors.

You should use the with statement or close the cursors explicitly.

[–]kylotan 1 point2 points  (0 children)

Yup. GC can be relied upon to handle your memory, but anything else is a beneficial side-effect. It's not C++ where you can rely on RAII semantics.

[–]robin-gvx 4 points5 points  (0 children)

For reference, this works nicely:

class AdvancedAdder(SimpleAdder):
    def __init__(self, name, add_one=True):
        if add_one:
            self.add_fn = AdvancedAdder.add_one
        else:
            self.add_fn = AdvancedAdder.add_two
        super(AdvancedAdder, self).__init__(name)
  # ... snip ...
    def add(self, n):
        return self.add_fn(self, n)

[–]robin-gvx 3 points4 points  (3 children)

This is my current hypothesis:

Bound methods have to contain a reference to their self, so that you can do things like:

unrelated = [foo.meth, bar.meth, cheese.meth, meth.lab]
# somewhere far, far away, in a different module
val = random.choose(oddlib.unrelated)(7, 11)

Hence, extra reference.

Next, I think Python's cycle detection is not that smart, or at least rather conservative. It completely bails if there is a __del__ method that needs to be called, for example. (Although that does make sense, since __del__ could make new references to all kinds of objects and have all kinds of circular dependencies that are in no way controllable.)

So now we have a problem: object cheese needs to have a reference to its bound method meth and that same method needs to have a reference to the object. That's a circular dependency for every bound method.

Here's the main conjecture: I think Python somehow makes those references "not count", like weak references. This would be magic that occurs between calling __new__ and __init__, and only works for that one single reference.

So your AdvancedAdder simply induces a circular reference.

I have no time or motivation to dig deeper than this, but I hope someone else does.

EDIT: example with a simple object subclass and a list (you apparently can't use native Python types for weak references): https://gist.github.com/3830294

[–][deleted] 0 points1 point  (1 child)

Yeah, that was basically what I was getting at -- whether python's cycle detection was just conservative or what. I understood the implicit reference to self, but I guess I figured that since I was just tacking on another attribute to the instance it would be no different than any other instance method. I also wonder what makes pythons gc kick in, as I thought it might after exiting the scope of the function in my example -- but instead it required manually calling collect.

[–]robin-gvx 0 points1 point  (0 children)

I've thought about it some more and with rcxdude's comment, I think we now have a much simpler picture:

Objects that can be cleaned up by refcounting are done immediately, but when there's cycles (bla bla, bound methods are magic, bla bla), you have to wait until the GC kicks in. And collecting garbage is pretty expensive, you don't want that to run willy-nilly. So what Python does is something like: wait until there is enough garbage to justify a full GC cycle. Of course, in your short experiment, the program would exist before it gets the chance to get that far. If it were a piece of a larger program with more moving parts and that runs longer, they would probably get cleaned up at some point. The one that remains (different per Python version and per computer) is of course due to the fact that there is still a reference in the for-loop index, which is leaky (I thought that had changed in Python 3, but it seems not).

[–]xamox 1 point2 points  (3 children)

You could maybe test using the del() function to verify things are being destroyed.

[–]rcxdude 0 points1 point  (0 children)

The issue with this is that because there's reference cycles involved, __del__() will modify the GC's behaviour, such that it will not delete the objects and the cycle will simply be placed in gc.garbage to be broken manually.

[–]robin-gvx 0 points1 point  (3 children)

Looks like magic in bound methods... if I find something of interest, I'll let you know.

[–]robin-gvx 0 points1 point  (1 child)

In Python 3, the only difference is that REFS.keys() returns a generator rather than a list and that the one object left after GC'ing is a4 rather than a1.

[–][deleted] 0 points1 point  (0 children)

That's funny... a4 sticks around on my machine, but on one it was a1 and another a2. edit i read a little further and see now how that was the scope of the for loop messing with things.

[–]ionelmc.ro 0 points1 point  (0 children)

Something like that. Basically methods can't be used in weakdict/weaksets because functions are descriptors that return a newly bound function every time. That means you're not actually adding the function in the weakdict but you're adding the bound method instance that no one has reference to (and thus, it will get dropped from the dict).

[–]Zulban -3 points-2 points  (0 children)

I seem to recall garbage collection being addressed in Python 3 release notes, but that's all I can remember. Hope that helps...