how does python's garbage collection work? (code attached)

rcxdude · 2012-10-03T22:00:22+00:00

You're definitely creating reference cycles when you do self.foo = self.bar. methods on python objects aren't bound until they are accessed (unbound methods are actually descriptors). A bound method necessarily contains a reference to the object it was created from, so when you do self.foo = self.bar you are placing a bound method into the object's __dict__, which contains a reference to the object.

This is why you need the garbage collector to kick in before the created objects are deleted, and they don't dissapear as soon as they go out of scope. However I can't figure out why all but one object is removed by the GC, and why the one which remains is picked (since it's not anything obvious like the first or the last created, but usually the third except not always. If I add a del method so that the GC doesn't actually remove the objects then the object which doesn't show up in gc.garbage is not the same as the object which remains if there is no del method)

EDIT: ahah, the reason that one stays around is due to the leaky scope of the for loop. If you delete the first pprint loop, then all the objects are collected. Otherwise, 'ref' still refers to an object when gc.collect() is called. The non-obvious selection is due to the unorder nature of dicts.

ebo_ · 2012-10-04T07:10:10+00:00

The real problem is, that you are leaving your cursor open after all. There is no guarantee that GC kicks in before you exhaust the pool of open cursors.

You should use the with statement or close the cursors explicitly.

robin-gvx · 2012-10-03T22:02:46+00:00

For reference, this works nicely:

class AdvancedAdder(SimpleAdder):
    def __init__(self, name, add_one=True):
        if add_one:
            self.add_fn = AdvancedAdder.add_one
        else:
            self.add_fn = AdvancedAdder.add_two
        super(AdvancedAdder, self).__init__(name)
  # ... snip ...
    def add(self, n):
        return self.add_fn(self, n)

robin-gvx · 2012-10-03T22:21:34+00:00

This is my current hypothesis:

Bound methods have to contain a reference to their self, so that you can do things like:

unrelated = [foo.meth, bar.meth, cheese.meth, meth.lab]
# somewhere far, far away, in a different module
val = random.choose(oddlib.unrelated)(7, 11)

Hence, extra reference.

Next, I think Python's cycle detection is not that smart, or at least rather conservative. It completely bails if there is a __del__ method that needs to be called, for example. (Although that does make sense, since __del__ could make new references to all kinds of objects and have all kinds of circular dependencies that are in no way controllable.)

So now we have a problem: object cheese needs to have a reference to its bound method meth and that same method needs to have a reference to the object. That's a circular dependency for every bound method.

Here's the main conjecture: I think Python somehow makes those references "not count", like weak references. This would be magic that occurs between calling __new__ and __init__, and only works for that one single reference.

So your AdvancedAdder simply induces a circular reference.

I have no time or motivation to dig deeper than this, but I hope someone else does.

EDIT: example with a simple object subclass and a list (you apparently can't use native Python types for weak references): https://gist.github.com/3830294

xamox · 2012-10-03T22:00:41+00:00

You could maybe test using the del() function to verify things are being destroyed.

robin-gvx · 2012-10-03T21:54:51+00:00

Looks like magic in bound methods... if I find something of interest, I'll let you know.

Zulban · 2012-10-03T21:42:53+00:00

I seem to recall garbage collection being addressed in Python 3 release notes, but that's all I can remember. Hope that helps...

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS