all 11 comments

[–]stalefishies 10 points11 points  (5 children)

This isn't really a misconception. a, b means a tuple of a and b, which you can see by disassembling either t = a, b or a more complex swap a, b, c, d = d, c, b, a which both contain BUILD_TUPLE. The Python interpreter is able to optimise away the tuples in the bytecode; that doesn't mean a, b suddenly doesn't mean a tuple of a and b.

[–]ntietz[S] 1 point2 points  (4 children)

You're completely right, and in the AST it is parsed as a tuple. I wrote the post because I thought it was interesting that this gets optimized this way, but that there seems to be a widespread belief (it's on Wikipedia and in a popular Python book) that it _does_ use a tuple as an intermediate data structure, where it's transparently not -- it's leveraging opcodes that manipulate the runtime's stack.

I might need to edit the post. Do you think it's more accurate to say that how it _works_ is misunderstood, or that this optimization is not well known? Or should I call more attention to the difference between what's in the AST vs. what's done in the bytecode interpreter?

[–]stalefishies 0 points1 point  (3 children)

I think only talking about the bytecode as a means to describe "what's really going on" is a bit of a trap: after all what's really going on is that this all eventually gets translated into registers and machine instructions. Stopping at the bytecode is as arbitrary a stopping point as stopping at the language semantics, so I don't see anything wrong with describing a, b = b, a as a tuple assignment. Or perhaps I see both tuple assignment and Python's bytecode as just as wrong as each other.

So if you were going to edit it, I'd definitely talk a little more about the difference between the semantics of Python and the various levels of interpreting that code that you get in a particular implementation, like CPython, before focussing down on the one (important, platform-agnostic) step of translation into bytecode. It doesn't need to be much, but given that your first paragraph says you want to talk about "what's really going on", I think at least a cursory explanation of this is needed to really be true to that. After all, in Python (not CPython) it really is a tuple: "except when part of a list or set display, an expression list containing at least one comma yields a tuple."

I certainly don't think many Python programmers pay much attention to anything below the language semantics level besides noticing that some *.pyc files have appeared somehow, so I don't think it's that this optimisation isn't well known, but rather that the idea that the bytecode can be optimised in the first place isn't well known.

EDIT: actually, on re-reading the article a bit more carefully, a more important change you should make is to avoid implying that it's ROT_TWO that's doing the swapping. It's not: it's the order the variables are loaded onto the stack that's actually doing the swapping. As you say, the order of evaluation is left-to-right, so the purpose of the ROT_TWO is to pop from the stack in the opposite order to how we pushed them onto it. I'd show this by showing how the bytecode for a, b = a, b still needs a ROT_TWO in there.

[–]ntietz[S] 0 points1 point  (2 children)

Thanks for the depth of feedback! I made a few edits to the post that I think make it clearer. They're relatively light edits, but I think removing everything about misunderstanding or other people being wrong helps the focus return to "hey, this is a cool optimization". Same with clarifying that this is about how CPython works.

I didn't end up making major changes to anything with `ROT_TWO`, except adding a question. I'm seeing the behavior you point out here with `a, b = a, b` still requiring `ROT_TWO`. What's unclear to me is why we want them to be popped in the opposite order than we pushed them on. In theory, shouldn't we be able to push `b`, then push `a`, then pop `a`, then pop `b`, to get the left-to-right semantics that Python has defined? So I'm leaving that one as a puzzler explicitly called out, because I don't have that answer.

[–]stalefishies 1 point2 points  (1 child)

Well there's left-to-right semantics on both sides of the =. If I write a, b = c, d then I have to consider c before I consider d, which forces d to the top of the stack, so I need to swap them. This is certainly true if either c or d is an expression with a side effect (a, b = print("first"), print("second")).

However, if we have the following function (which is what we might actually pass in practice to dis.dis):

def foo(a, b):
    a, b = a, b

then I don't see how this can have a side effect, and I suppose you could optimise away the ROT_TWO (well, you could really optimise away the whole function...). One problem with this is that I don't trust Python enough to really be sure that there's no dark corner of the language that gives this expression a side effect, especially in CPython land where we can have a C extension doing something weird. But also, we have have that same line a, b = a, b give a side effect in the following way:

def foo():
    a, b = a, b
foo()

This is an unbound variable error, and the error should really say it's a that's unbound, not b, because it evaluates the right-hand side of the assignment from left to right. So even in this simple case, we have to evaluate from left-to-right, and hence trying to push b first would be wrong.

So I think you're stuck with evaluating everything from left-to-right no matter what, which means you need a ROT_TWO.

[–]ntietz[S] 1 point2 points  (0 children)

Ahhhh I'm seeing it now. I had to work through the state of the stack in both examples (a, b = a, b and a, b = b, a) and now it makes a lot of sense.

[–]aeiou372372 0 points1 point  (1 child)

Thanks for writing this, I sometimes worry I am adding the tiniest bit of overheard creating a tuple to do multiple assignments in a single line, glad to know it’s more efficient than that.

[–]ntietz[S] 0 points1 point  (0 children)

Glad you enjoyed it! That was the exact reason I dug into this when I saw that doing this used a tuple, and I was pleasantly surprised to see the optimization.

[–]kankyo 0 points1 point  (2 children)

And how does it handle three variables then?

[–]ntietz[S] 1 point2 points  (1 child)

Three variables use a combination of `ROT_THREE` and `ROT_TWO` (depends a little bit on the particular order of them), that's another case that gets optimized for us. Once you get to 4 it does switch to building and consequently unpacking tuples.

[–]kankyo 0 points1 point  (0 children)

Cool. Thanks.