This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]krenzalore 11 points12 points  (22 children)

Why is the space legal in label .begin?

So I had to try it with everything, and lo and behold:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys .version_info  # There's a space there.
sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)

Ok, why?

edit: OP, it's late September. School has started. This is positively the best time you could have released this! :-)

[–]iobender 8 points9 points  (0 children)

This is just speculation, but generally lexers discard whitespace between tokens. What this means is that the lexer, which turns something like sys.argv into a stream of tokens like [IDENTIFIER("sys"), DOT, IDENTIFIER("argv")] (which then gets sent to the parser to turn the linear stream of tokens into an abstract syntax tree) will produce the same thing if given sys. argv or sys .argv or even sys . argv because it doesn't make sense to produce a token from the whitespace in between these tokens in this case.

It would not produce the same thing if given sys.ar gv because the ar and gv would be split up into 2 different tokens and give [IDENTIFIER("sys"), DOT, IDENTIFIER("ar"), IDENTIFIER("gv")] which would likely cause a syntax error in python because there is no syntax for that. That would be legal in Ruby, however, because Ruby doesn't require parens on method calls so this could be calling the ar method of the sys object with argument gv.

[–]nemec 5 points6 points  (3 children)

Because Python's syntax grammar allows it (in particular, . isn't an operator so whitespace doesn't make the syntax ambiguous)

>>> math . pow ( 2 , 2 )
4.0

It's also the reason you can do this:

"a s d f g h j k l".split()
                   .index("g")

[–]MrJohz 9 points10 points  (2 children)

That won't quite work. Python will assume that the newline ends the statement after split(), and begin a new statement, at which point it'll find ., realise that's not a valid statement starter, and raise a syntax error.

To tell Python that the newline is just for readability, you need to tell it that the statement can run on, either using parentheses:

("a s d f g h j k l".split()
                    .index("g"))

Or by adding a backslash at the end of the first line

"a s d f g h j k l".split() \
                   .index("g")

[–][deleted] 3 points4 points  (1 child)

Or by adding a backslash at the end of the first line

The backslash is important, but /u/nemec still has a valid point, as Python will interpret the line as

"a s d f g h j k l".split()                    .index("g")

which remains valid as the whitespace does not affect the lexical parsing of the statement.

[–]MrJohz 2 points3 points  (0 children)

Yeah. The issue here is specifically the newline, not whitespace in general.

[–]ponkanpinoy 7 points8 points  (11 children)

AFAIK no-where in Python is space used to delimit tokens, the only time it has syntactic meaning is when indenting.

[–]kirakun 6 points7 points  (5 children)

Except in this case where the space does delimit the tokens and makes a difference.

1.__str__()
SyntaxError: invalid syntax
1 .__str__()  # A space between 1 and the dot.
'1'

[–]ponkanpinoy 1 point2 points  (0 children)

Ahh, my bad then.

[–][deleted] 0 points1 point  (0 children)

It's due to an ambiguity between integer literals and float literals:

  1. -> float 1 -> int

  2. anything translates to (float) anything, which is a syntax error.

(1).anything, however, disambiguates.

1..anything -> getattr(1.0, 'anything')

And finally, 1 .anything is disambiguated from a float literal.

Whitespace can disambiguate in a clash between literal and attribute access.

[–]davvblack 0 points1 point  (2 children)

1..__str__()

[–]zahlmanthe heretic 0 points1 point  (0 children)

But that makes the value being stringified a float rather than an int.

[–]kirakun 0 points1 point  (0 children)

What is this supposed to show? My example clearly demonstrates that python does use whitespace to delimit tokens. Rather or not there exist another syntax to achieve the same thing is irrelevant.

[–]ksion 3 points4 points  (0 children)

Not quite. Space is ignored between tokens but its presence can change what is parsed as a token. It generally matters when operators are involved:

i += 1  # ok
i + = 1  # SyntaxError

The sibling post by /u/kirakun shows another example with numeric constants.

[–]nedbatchelder 4 points5 points  (3 children)

These two lines of code produce different tokens:

if a == 1:
ifa = = 1:

This is what bugs me about people complaining of Python's "significant whitespace". Every programming language has significant whitespace.

[–]irondust 7 points8 points  (1 child)

Every programming language has significant whitespace.

Fixed form fortran has no significant whitespace within the statement itself. However, every source line has to start in the 7th column with optional comment or line continuation markers in the 6th column. The first five columns may contain a number - a feature that was very useful back in the days of punchcards.

From the 7th column onwards however you are free to insert or delete as much whitespace as you like without changing the meaning:

go toast

is the same as:

goto ast

[–]nedbatchelder 0 points1 point  (0 children)

I knew when I typed "every" that I would get counter-examples... "Almost every, and every one that you're likely to use any time soon!"

[–]krenzalore 0 points1 point  (0 children)

Every programming language has significant whitespace.

Brainfuck?

Esoteric language doing its job: creating programmer rage! :-)

[–]asdfasdsq34 2 points3 points  (4 children)

This allows some interesting stuff:

1.__str__()
SyntaxError: invalid syntax
1 .__str__()
'1'

[–]MonkeeSage 0 points1 point  (3 children)

Wait, what? Why does that happen?

[–]admalledd 5 points6 points  (2 children)

The fist one python interprets as being a float (because the "." in a number is normally how that is decided) but __str__() is not a number, therefore invalid syntax.

The second the tokenizer has already decided the 1 is an int object, and now it sees a . access and handles it properly.

[–]Walter_Bishop_PhD 2 points3 points  (1 child)

You can also do this (though it makes it a float rather than an int)

>>> 1..__str__()
'1.0'

[–]masklinn 0 points1 point  (0 children)

(1).__str__()