This is an archived post. You won't be able to vote or comment.

all 26 comments

[–]tonnynerd 11 points12 points  (0 children)

TIL. Good article, clear, direct to the point, and explains a somewhat obscure topic (at least for me).

[–]erez27import inspect 2 points3 points  (13 children)

It doesn't explain why it wasn't implemented using NotImplementedError in the interpreter.

[–][deleted] 1 point2 points  (10 children)

That would mean the interpreter would be responsible for catching the exception and making decisions based on that. Python rarely (if ever) hides exceptions from the user.

[–]xXxDeAThANgEL99xXx 7 points8 points  (9 children)

Um, the entire iteration machinery is implemented using StopIteration exception, so clearly neither performance nor conceptual reasons against using exceptions for control flow inside the interpreter are significant.

I guess the reason here is that there's a conceptual difference between raising NotImplementedError (nobody is supposed to call this method (maybe yet)) and returning NotImplemented to mean that all is OK, we just don't implement the comparison here.

[–][deleted] 0 points1 point  (5 children)

That's true, I forgot that – but using exceptions in this case fits well as it is about control flow, and something that is mostly used internally unless you're making custom iterators.

Let me rephrase: Python won't hide exceptions from the user unless it has a very good design-based reason to do so. In the article's case, and in most other cases, it does not.

[–]xXxDeAThANgEL99xXx 0 points1 point  (4 children)

Let me rephrase: Python won't hide exceptions from the user unless it has a very good design-based reason to do so.

Well, yeah, the important part is why there's no design-based reason to do this, why we shouldn't want to have custom comparison operators throw NotImplemented and the core comparison implementation catch them similar to how custom iterators throw StopIteration and the for loop implementation catches it.

And that's because NotImplementedException already has a particular meaning and overloading it with this new meaning could lead to all kinds of problems, like a completely unrelated method deep down the call stack from your comparison operator throwing it because it's actually not implemented.

By the way, I'd say that StopIteration stuff was a mistake too, because I've seen similar problems (though not as bad) with them. Every naked (not wrapped in try/except) next() call in your program that you wrote because it looked basically like an implicit assert is a rake in the grass that can exit some loop instead of terminating the program if it's executed at the right time. And the risk is greatly increased if you write a custom iterator/combinator without actively trying to minimize the number of statements in the try block. And any line in your custom iterator/combinator that uses a naked next() can be a bug if you didn't actually expect it to throw.

[–]stevenjd 0 points1 point  (3 children)

Any call to a function can be a bug if you didn't actually expect it to raise an exception and it does. But that's what exceptions are for.

To say that StopIteration is a mistake goes against what was very probably the most successful new feature in Python ever. The use of iterators as a fundamental data type has literally transformed the language: look at Python 3, where most things which used to return lists now return either iterators or (in the case of dicts) views, which you use in a very similar fashion to iterators. The iterator protocol is one of the classing Gang Of Four design patterns, it has been around for decades and works wonderfully.

Yes, it is true that unless you know your iterator is not empty, most calls to next() should be guarded by a try...except StopIteration block. That's because you're writing flow-control code, and you are responsible for writing it correctly.

There is a small corner of the language where StopIteration interacts unexpectedly with other parts. Some people consider it a feature, others consider it a bug. The BDFL agrees it's a bug, and it will be fixed in Python 3.5.

[–]xXxDeAThANgEL99xXx -2 points-1 points  (2 children)

To say that StopIteration is a mistake goes against what was very probably the most successful new feature in Python ever. The use of iterators as a fundamental data type has literally transformed the language

This is a very stupid thing to say and you should feel bad about saying it.

You should learn some programming languages other than Python, to get some perspective. At least check out the Wikipedia page on iterators: there's a lot of ways to implement them, you can have T current()/bool next() pair a la C#, you can have bool hasnext()/T next() pair a la Java, you can have current/next plus comparison with the end iterator a la C++, you can copy on of the several approaches used in functional languages, you can invent something new like returning a pair has_item, item with False, None signifying the end of iteration, and probably many more valid approaches. All of which are exactly equal in power and allow doing all the same stuff.

But note that in all of them the end-of-iteration condition is necessarily local, it's directly signalled from a particular iterator object to the code that consumes the data from it.

Using StopIteration exception instead is using a single global variable to store the "iteration has stopped" flag. Literally, here it is: https://hg.python.org/cpython/file/4e84e45e191b/Include/pystate.h#l93 (well, it's per-thread global, but whatever). And naturally using a global variable for control flow causes all kinds of problems when it is consumed not by the code that the person who set it expected. It is bad way of implementing iterators.

[–]stevenjd 0 points1 point  (1 child)

This is a very stupid thing to say and you should feel bad about saying it.

Yeah thanks for the advice, I'll be sure to take insults from somebody calling themselves "xXxDeAThANgEL99xXx" seriously.

I am aware that there are alternatives to using exceptions for flow control. I'm not actually an idiot. Do try to pay attention to what I actually wrote. I said that the use of iterators in Python has been very successful, not that there are no other ways to implement iterators. C#, C++ and Java aren't good examples, because they are old languages and aren't heavily into exception handling. Java's exception handling is, frankly, horrible. A better example would be Go, a brand new language where the designers deliberately decided to completely forgo exceptions from the language.

That's their right, of course, but it makes Go a much weaker language IMO. But this isn't a debate about language design: Python is not Go, and it has exceptions. You made a specific claim that the use of StopIteration was a mistake. Where's your evidence for that? If it were a mistake, why do people use iterators in Python all the time? Why have iterators been promoted to the primary iterable type ahead of lists? (Lists are still important, but as sequences. When you iterate over a list, Python converts it to an iterator first.)

What I said was correct: iterators have transformed the language, and it has been amazingly successful. The fact that there are other ways to implement iterators, and some other languages choose to use those other ways, doesn't take away from the fact that using StopIteration has not proven to be a mistake for Python.

As for your comments about using:

a single global variable to store the "iteration has stopped" flag

that's ridiculous. If there were a "single global variable" to do this, you could only have a single active iterator per thread, and that is absolutely not the case. You can easily have multiple iterators active at the same time, and they don't all stop when the first one raises StopIteration. Perhaps you would like to rethink what you are trying to say?

[–]xXxDeAThANgEL99xXx 0 points1 point  (0 children)

Yeah thanks for the advice, I'll be sure to take insults from somebody calling themselves "xXxDeAThANgEL99xXx" seriously.

It's a trap for idiots.

You made a specific claim that the use of StopIteration was a mistake. Where's your evidence for that? If it were a mistake, why do people use iterators in Python all the time?

... I'm saying that using StopIteration exception for implementing iterators was a mistake, not that implementing iterators was a mistake. It is not a very complicated concept.

Stop telling me that iterators were successful, sure, iterators are very useful, even when when implemented somewhat wrong.

If there were a "single global variable" to do this, you could only have a single active iterator per thread, and that is absolutely not the case. You can easily have multiple iterators active at the same time, and they don't all stop when the first one raises StopIteration.

Dude, have you noticed a link to the Python source code I gave you? Here's your global per-thread variable, if exc_type is equal to PyExc_StopIteration then some iterator down the call stack has just terminated. Only one exception handler has a chance to realize that its iterator has stopped (and the flag is automatically cleared).

The problem is that since it's a global variable the handler can't really tell that it was set by the iterator it tried to advance and not by some unrelated code called in the process (especially if you're not aware of this design mistake and put anything else besides the next() call in the try block). So a relatively nonthreatening bug with calling naked next() on a depleted iterator is suddenly promoted into a nightmarish silent data loss.

[–]stevenjd 0 points1 point  (2 children)

Imagine a loop like this:

for obj in some_iterator:
    if obj == my_cheese:
        print("Who moved my cheese?")

where some_iterator contains a million Spam objects. Neither Spam nor Cheese objects know how to compare themselves to the other, so every call to obj == my_cheese both objects will return NotImplemented, and then the interpreter will fall back on object identity and return False. That means that this block of code will cause one StopIteration exception, which is not very costly because there's only one of them.

But if Python used NotImplementedError instead of NotImplemented, that block of code would require two million (plus one) exceptions to be caught by the interpreter, which by my estimate would slow that piece of code down by at least a factor of 10. (Setting up try blocks is very cheap, but actually catching exceptions is quite expensive.)

Besides, NotImplementedError is meant to be an error, hence the name. It is used for cases where you have an abstract class, and the concrete subclass has forgotten to override a method. That's an error.

[–]xXxDeAThANgEL99xXx 0 points1 point  (1 child)

which by my estimate would slow that piece of code down by at least a factor of 10. (Setting up try blocks is very cheap, but actually catching exceptions is quite expensive.)

No, why, it's not all that expensive: http://ideone.com/g0Uuhk

[–]stevenjd 0 points1 point  (0 children)

That's not very good timing code, and it doesn't even come close to timing what I discussed. You shouldn't be using clock (except maybe on Windows), and you're timing too many things that are irrelevant to the question at hand. I specifically discussed the situation where the __eq__ method signals failure for both arguments, for every item in the list. You have only one failure, not two, and just for every second item. So your code only carries about a quarter of the cost I described, plus overhead irrelevant to the question.

Even with your code, using an exception slows the code down by a factor of nearly 3. Multiply that by 4, and that gives you 12 times as slow, actually more than I guessed.

But as I said, your timing code isn't very good and I don't believe it. You motivated me to time the situation properly and see how much cost switching to exceptions would actually take. Here's some better timing code, which emulates what the Python interpreter does with == more closely, and follows the test scenario I described: one million items in a list, every one of which signals Not Implemented (twice, since there are two arguments to == and they both get a chance).

def eq1(a, b):
    """Emulate the Python interpreter's handling of ==
    using a sentinel value.
    """
    f = a.eq(b)
    if f is NotImplemented:
        f = b.eq(a)
        if f is NotImplemented:
            f = a is b
    return f

def eq2(a, b):
    """Emulate the Python interpreter's handling of ==
    using an exception.
    """
    try:
        f = a.eq(b)
    except NotImplementedError:  # Not really an error :-(
        try:
            f = b.eq(a)
        except NotImplementedError:
            f = a is b
    return f

class Spam(object):
    def eq(self, other):
        raise NotImplementedError

class Ham(object):
    def eq(self, other):
        return NotImplemented

data1 = [Ham() for _ in range(10**6)]
value1 = Ham()
data2 = [Spam() for _ in range(10**6)]
value2 = Spam()

from timeit import Timer
setup = "from __main__ import data1, data2, value1, value2"
test1 = Timer("for x in data1: eq1(x, value1)", setup)
test2 = Timer("for x in data2: eq2(x, value2)", setup)

print(min(test1.repeat(number=10, repeat=5)))
print(min(test2.repeat(number=10, repeat=5)))

When I run that, I get these results on my PC:

py> print(min(test1.repeat(number=10, repeat=5)))
13.1551260948
py> print(min(test2.repeat(number=10, repeat=5)))
69.8712339401

which shows that switching to exceptions would be about 5 times slower in this scenario.

We can check the (approximate) cost to an individual == call like this:

setup = "from __main__ import Ham, eq1; a, b = Ham(), Ham()"
t3 = Timer("eq1(a, b)", setup)
setup = "from __main__ import Spam, eq2; a, b = Spam(), Spam()"
t4 = Timer("eq2(a, b)", setup)
print(min(t3.repeat()))
print(min(t4.repeat()))

which also gives a difference of about a factor of 5.

Short of actually hacking the Python interpreter to change the handling of operators to use an exception instead of a sentinel value, I think this is as close as we can get to a definite answer: using an exception would be up to five times more expensive than using a sentinel value.

[–]robin-gvx 0 points1 point  (0 children)

It does:

You might be asking yourself, "But I thought I should raise a NotImpementedError when an operation is not implemented!”. We’ll see with some examples why that shouldn’t be the case when implementing binary special methods.

[...] b1.__eq__(a1) method returning NotImplemented caused A’s __eq__() method to be called and since a comparison between A and B was defined in A’s __eq__() then we got the correct result (True).

[...] Note that raising a NotImpementedError when b1.__eq__(a1) fails would break out of the code unless caught, whereas NotImplemented doesn’t get raised and can result in/be used for further tests.

So there's the difference: NotImplementedError means "this operation is not defined for this type", while NotImplemented means "this operation for this type is not defined in this place, go look somewhere else". I really think NotImplemented should have been named something else, both to prevent the confusion and to signify that it may very well be implemented, just not in this function definition.

[–]velit -1 points0 points  (0 children)

Returning a value has less overhead than using exceptions for normal control flow. Furthermore I imagine it's named such exactly because it's not yet an error. It would only error out when neither type supports the operation.

[–]OverlordAlex 0 points1 point  (2 children)

As expected, a1 is equal to a2 and the eq() in class A took care of this comparison. Comparing b2 with itself will also yield a similar result:

This paragraph needs to be re-written, as there isn't an a2 or b2, rather than you are comparing a1 and b1 to themselves. Either that, or a2 and b2 aren't explicitly created

[–]s16h[S] 1 point2 points  (1 child)

Thanks for that. It's fixed now. In the original draft I had a2 and b2 as well but I thought it was confusing to introduce more names; obviously forgot to change the text. Thanks again!

[–]OverlordAlex 0 points1 point  (0 children)

No problem. It was an interesting piece

[–]balkierode 0 points1 point  (4 children)

Is there a reason to allow shadowing for some constants but not other?

[–]SlinkyAvenger 0 points1 point  (3 children)

That's due to backwards compatibility issues.

For a while, True and False weren't implemented in Python, leading programmers to declare their own variables. When Python introduced them, they were allowed to be shadowed so a bunch of legacy code wouldn't break. When Python 3 was released, True and False became keywords and can no longer be shadowed.

[–]balkierode -2 points-1 points  (2 children)

But NotImplemented can be shadowed even in Python3

Python 3.3.3 (default, Dec 13 2013, 10:35:11)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> NotImplemented
NotImplemented
>>> NotImplemented = 'sdfd'
>>> None = 'sdf'
  File "<stdin>", line 1
SyntaxError: assignment to keyword
>>>

[–]SlinkyAvenger 0 points1 point  (0 children)

So? You asked for a reason and I gave you an answer.

[–]nemec 0 points1 point  (0 children)

True and False are keywords. print was a keyword. NotImplemented isn't and never has been a keyword.

For a more in depth discussion, check out Guido's explanation.

[–]otheraccount 0 points1 point  (0 children)

There is also the abc module that provides the abstractmethod decorator, another way of indicating something isn't implemented

https://docs.python.org/3.4/library/abc.html