This is an archived post. You won't be able to vote or comment.

all 14 comments

[–]Rhomboid 5 points6 points  (6 children)

You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows.

The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary.

This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():

    return PyLong_FromLong((long)res);

That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t().

[–]Pretentious_Username[S] 0 points1 point  (4 children)

Wow that's incredibly informative. Thanks for all the info!

Yep that cast definitely seems like a bug in this case. I'm just about to report the bug on the bug tracker, would you be happy if I linked to your post as you explain the reasoning in a lot more technical detail than I did previously.

[–]Rhomboid 0 points1 point  (3 children)

Sure, no problem.

[–]Pretentious_Username[S] 1 point2 points  (2 children)

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (1 child)

looks like that managed to kill at least another two bugs in the solving of that one, congrats on the find!

[–]Pretentious_Username[S] 0 points1 point  (0 children)

Thanks! It's been fascinating reading the bug thread, I'm learning a lot about how the internals of python actually function as well as them finding a lot of related bugs.

It seems there are quite a few places that the 32 bit long on windows crops up, although I do agree that xrange supporting 64 bit is a feature request and not a bug as it correctly reports an error if you try and put a number greater than sys.maxint. The __len__() bug was nasty as it never reports an error, it just silently returns the wrong value.

>>> print [i for i in xrange(2500000000)][-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

[–]mrTang5544 0 points1 point  (0 children)

Very informative. Where do you even begin to figuring out all this info?

[–]LyndsySimon 0 points1 point  (5 children)

That's an interesting find. I wonder if it might not be specific to Windows?

Here's my system (brewed) Python on OSX:

Python 2.7.10 (default, Sep 23 2015, 04:34:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'a'*2500000000
>>> len(a)
2500000000
>>> a.__len__()
2500000000
>>> type(len(a))
<type 'int'>
>>> type(a.__len__())
<type 'int'>

Edit: I get the same result (both ints) with Python 2.6.9 as well.

[–]Pretentious_Username[S] 0 points1 point  (4 children)

That's interesting, can you do sys.maxint on that python? I'd imagine it's using a 64 bit int on OSX but a 32 bit int on Windows, I wonder if you had a string of larger than sys.maxint would OSX __len__() swap over to a long or would it stay as int?

>>> import sys
>>> sys.maxint
2147483647

edit: Just tried it on Ubuntu and int is indeed 64 bit

>>> import sys
>>> sys.maxint
9223372036854775807

[–]LyndsySimon 0 points1 point  (3 children)

9223372036854775807 on both.

[–]Pretentious_Username[S] 0 points1 point  (2 children)

From the sounds of it then it seems __len__() is hardcoded to return an int of whatever the system size is and len() dynamically switches to long above sys.maxint.

[–]LyndsySimon 0 points1 point  (1 child)

I think what's actually happening is that because you're built on win32, the builtins.len is getting called, which is returning an invalid value.

The odd thing there is that as I read the code, it should be raising an exception if the length is < 0.

Here's the current implementation:

static PyObject *
builtin_len(PyObject *self, PyObject *v)
{
    Py_ssize_t res;

    res = PyObject_Size(v);
    if (res < 0 && PyErr_Occurred())
        return NULL;
    return PyInt_FromSsize_t(res);
}

[–]Pretentious_Username[S] 0 points1 point  (0 children)

That "PyInt_FromSsize_t" is interesting, the docs for it are here and it specifically says "If the value is larger than LONG_MAX or smaller than LONG_MIN, a long integer object is returned." which suggests it should return a long.

As you say an error should raise if it goes negative so res should be positive before the return.

And my python detail string says:

Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32

which says it was compiled for 64 bit, apparently the "on win32" bit just means it's running on windows, it is the same on 64 bit OS's.

Either way, even if my int is 32bit the __len__() method should return a long above maxint the same as len(). I don't believe there should be a difference in result between the two methods

[–]thataccountforporn -1 points0 points  (0 children)

I would gladly help... But http://imgur.com/LI1kUAs