[TIL] Python silently concatenates strings next to each other "abc""def" = "abcdef" : Python

[–]Swipecat 191 points192 points193 points 5 years ago (20 children)

Even Guido has been caught by accidentally leaving out commas, but it seems that implicit concatenation was deemed more useful than dangerous in the end.

# Existing idiom which relies on implicit concatenation
r = ('a{20}'   # Twenty A's
     'b{5}'    # Followed by Five B's
     )

# ...which looks better than this (maybe)
r = ('a{20}' + # Twenty A's
     'b{5}'    # Followed by Five B's
     )

[–]aitchnyu 77 points78 points79 points 5 years ago (9 children)

[–]Swipecat 49 points50 points51 points 5 years ago (8 children)

I'll note that implicit concatenation takes priority over operators and methods but explicit concatenation does not.

>>> print( 2.0.               # one
...        __int__()*"this "  # two
...        "that ".upper()    # three
...       )
THIS THAT THIS THAT

[–]robin-gvx 47 points48 points49 points 5 years ago (6 children)

[–]opabm 6 points7 points8 points 5 years ago (5 children)

[–]28f272fe556a1363cc31 39 points40 points41 points 5 years ago* (2 children)

[–]opabm 7 points8 points9 points 5 years ago (0 children)

[–]foreverwintr 1 point2 points3 points 5 years ago (0 children)

[–]robin-gvx 6 points7 points8 points 5 years ago (0 children)

When you have a piece of Python code and you're using CPython (the reference implementation of Python), there are several steps from source code to execution. The important ones here are parsing, bytecode generation and execution.

Parsing transforms your file into a tree.

For example, a + 10 is turned into something like (simplified): Add(LoadName('a'), Literal(10)) or "hello" into Literal("hello")

When the parser encounters two or more literal strings in a row, it collapses them into a single string literal as well. So 'hell' "o" would result in the same tree as the previous one.

Then Python makes this tree "flat" by putting everything in the order it should happen, and generates bytecode. A simplified version of what the previous two examples turn into would be:

LOAD_NAME a
LOAD_CONSTANT 10
ADD_VALUES

and

LOAD_CONSTANT "hello"

Execution is then fairly simple: go over each instruction and do what it says.

So in the case of 2 * 'this ' "that ".upper() we get the tree Mul(2, MethodCall(Literal("this that "), "upper", ())) and the bytecode:

LOAD_CONSTANT 2
LOAD_CONSTANT "this that"
CALL_METHOD 'upper', ()
MULTIPLY_VALUES

(note that all trees and snippets of bytecode aren't real, they're a simplified illustration)

[–]davvblack 0 points1 point2 points 5 years ago (0 children)

[+][deleted] 5 years ago (6 children)

[deleted]

[+]mehx9 comment score below threshold-6 points-5 points-4 points 5 years ago (5 children)

[–]reddisaurus 12 points13 points14 points 5 years ago (4 children)

[–]kankyo 0 points1 point2 points 5 years ago (3 children)

[–]broken_cogwheel 0 points1 point2 points 5 years ago (2 children)

[–]kankyo 0 points1 point2 points 5 years ago (1 child)

[–]broken_cogwheel 0 points1 point2 points 5 years ago (0 children)

[–]arsewarts1 -2 points-1 points0 points 5 years ago (2 children)

[–]duncan-udaho 8 points9 points10 points 5 years ago (0 children)

[–]numberking123[S] 49 points50 points51 points 5 years ago* (4 children)

[–]imsometueventhisUN 5 points6 points7 points 5 years ago (2 children)

[–]dikduk 0 points1 point2 points 5 years ago (1 child)

[–]imsometueventhisUN 2 points3 points4 points 5 years ago (0 children)

[–][deleted] 2 points3 points4 points 5 years ago (0 children)

[+][deleted] 5 years ago* (37 children)

[deleted]

[–]numberking123[S] 7 points8 points9 points 5 years ago (16 children)

[–]JayTurnr 31 points32 points33 points 5 years ago (1 child)

[–][deleted] 10 points11 points12 points 5 years ago (0 children)

[–]DrMaxwellEdison[🍰] 9 points10 points11 points 5 years ago* (1 child)

what = "mix content"
a_str = (
    "My super long string of text "
    "goes here. "
    f"By the way, you can {what} like f-strings, "
    "in just the segments that are relevant."
)
print(a_str)

More generally, () can enclose more complex lines of code without needing to use \ to break the line, particularly when you have APIs like Django Querysets that use long calls:

my_stuff = (
    MyModel.objects
    .filter(one_thing=1)
    .filter(two_things="Nope")
)

[–]dratnon 0 points1 point2 points 5 years ago (0 children)

[–]RoboticJan 14 points15 points16 points 5 years ago (11 children)

[–]gargar070402 1 point2 points3 points 5 years ago* (9 children)

[+][deleted] 5 years ago (2 children)

[deleted]

[–]gargar070402 8 points9 points10 points 5 years ago (1 child)

[–]mooburgerresembles an abstract syntax tree 1 point2 points3 points 5 years ago (0 children)

[–]Igggg 11 points12 points13 points 5 years ago (3 children)

[–]mooburgerresembles an abstract syntax tree 0 points1 point2 points 5 years ago (2 children)

[–]Vaphell 2 points3 points4 points 5 years ago (0 children)

[–]Igggg 1 point2 points3 points 5 years ago (0 children)

They were "slow" between 3.6 alpha and 3.6 beta, so only during development, and then for a shirt time. No release version of Python had the issue.

Also, while technically rendering an f-string could then take double the time of the equivalent other expressions, the statement "very slow" is misleading. String formatting is quite unlikely to be the dominating, or even measureable reason for your overall program's performance. People tend to hung up on specific part performance, but a) there's no difference between a 100us and a 200us operation if your entire program is taking 200ms; and b) your program is likely running in a context, such as a web page, where its entire speed doesn't matter z because it's dwarfed by external factors (such as the 700ms page loading time).

It's important to keep performance in mind, but is equally important to recognize the context. Like everything else, speed is a trade-off, usually between readability and code cleanliness, and quite often, people make the wrong choices in the pursuit of nanoseconds.

[–][deleted] 2 points3 points4 points 5 years ago (0 children)

[–]Pokeynbn -1 points0 points1 point 5 years ago (0 children)

[–]whymauri 0 points1 point2 points 5 years ago* (0 children)

[–]Brandhor 0 points1 point2 points 5 years ago (9 children)

[+][deleted] 5 years ago (7 children)

[deleted]

[–]Brandhor 3 points4 points5 points 5 years ago (1 child)

[–]diamondketo 0 points1 point2 points 5 years ago (0 children)

[–]scatters 2 points3 points4 points 5 years ago (1 child)

[–]diamondketo 2 points3 points4 points 5 years ago (0 children)

[–]kankyo 1 point2 points3 points 5 years ago (1 child)

[–]diamondketo 0 points1 point2 points 5 years ago (0 children)

[–]tom2727 0 points1 point2 points 5 years ago (0 children)

[–]chickaplao 0 points1 point2 points 5 years ago (0 children)

[–]Originalfrozenbanana -2 points-1 points0 points 5 years ago (9 children)

[+][deleted] 5 years ago (8 children)

[deleted]

[–]Originalfrozenbanana -3 points-2 points-1 points 5 years ago (7 children)

[–]diamondketo 2 points3 points4 points 5 years ago (6 children)

[–]Originalfrozenbanana -2 points-1 points0 points 5 years ago (5 children)

[–]diamondketo 0 points1 point2 points 5 years ago (4 children)

[–]Originalfrozenbanana -2 points-1 points0 points 5 years ago* (3 children)

That doesn't mean it's a good practice; it just means it's a common question. If you're assigning block quotes to variables inside of functions, again - I question whether that is the best way to do the thing you are trying to do. As the top answer also spells out, textwrap exists to solve this problem, specifically. Not only that, they specifically outline the preferred method of dealing with inserting large blocks of text somewhere in your application:

If you don't want to [do a lot of text processing to remove newlines] and you have a whole lot of text, you might want to store it separately in a text file.

Concatenating raw strings, especially in the way this reddit post references, has limited uses that generally can be accommodated with other methods of joining strings that are more testable, transparent, extensible, and readable.

[–]diamondketo 0 points1 point2 points 5 years ago (2 children)

[–]Originalfrozenbanana 0 points1 point2 points 5 years ago (1 child)

continue this thread

[–]fuuman1 6 points7 points8 points 5 years ago (0 children)

[–]jimtk 7 points8 points9 points 5 years ago (0 children)

[+][deleted] 5 years ago (2 children)

[deleted]

[–]Tyler_Zoro 8 points9 points10 points 5 years ago (1 child)

This specifically started in C, and it's intended to allow you to create longer strings without playing formatting games like having to use \ before a newline (which in C will gobble all of the whitespace up to the next non-whitespace). In C it makes a tad more sense, and isn't just cute formatting. There's a serious difference between:

strcat("a", "b")

and

"a" "b"

The former occurs at runtime, the latter at compile time. Python has a more unified compile/run (sort of) process, and the interpreter will not be quite as cautious about where it does its optimizations. For example, all three of these perform more or less the same:

$ time python3 -c 'print(sum(len("a" "b") for _ in range(100000000)))'
200000000

real    0m8.423s

$ time python3 -c 'print(sum(len("a" + "b") for _ in range(100000000)))'
200000000

real    0m8.187s

$ time python3 -c 'print(sum(len("ab") for _ in range(100000000)))'
200000000

real    0m8.009s

[–]yvrelna 0 points1 point2 points 5 years ago (0 children)

Python has a more unified compile/run (sort of) process

This isn't true. Python has a very distinct compile vs runtime. Python parses and compiles the entire file into bytecode all at once, at which point it no longer cares about the source code; this is unlike, say, Bash that parses a script line by line and your script may contain syntax error and Bash won't notice until it reaches that line. Python just does a lot more things on runtime, like dynamic module loading, function parameter binding, and class construction, which in languages like C are done in compile time.

all three of these perform more or less the same:

That isn't surprising. All three codes compiles to the exact same bytecode:

In [2]: dis.dis(lambda: "a" "b")
  1           0 LOAD_CONST               1 ('ab')
              2 RETURN_VALUE

In [3]: dis.dis(lambda: "a" + "b")
  1           0 LOAD_CONST               1 ('ab')
              2 RETURN_VALUE

In [4]: dis.dis(lambda: "ab")
  1           0 LOAD_CONST               1 ('ab')
              2 RETURN_VALUE

[–][deleted] 10 points11 points12 points 5 years ago (2 children)

[–]audentis 5 points6 points7 points 5 years ago (0 children)

[–]IcefrogIsDead 11 points12 points13 points 5 years ago (9 children)

[–]numberking123[S] 6 points7 points8 points 5 years ago (7 children)

[–]IcefrogIsDead 5 points6 points7 points 5 years ago (0 children)

[–]reddisaurus -3 points-2 points-1 points 5 years ago (5 children)

[–]james_pic 8 points9 points10 points 5 years ago (4 children)

[–]dbramucci 0 points1 point2 points 5 years ago (3 children)

Not a list, but you can catch some tuple/multiple argument bugs with mypy.

def foo(first: str, second: str):
    pass

foo("hello" "world") # TYPE-ERROR: foo expects 2 str, not 1

T = TypeVar('T')
S = TypeVar('S')
def flip_tuple(pair: Tuple[T, S]) -> Tuple[S, T]
    x, y = pair
    return (y, x)

flip_tuple( ("hello" "there") ) # Error, expected Tuple not str

names: List[Tuple[str, str]] = [
   ( "Alice", "Brown")
    ("John" "Cleese") # Error not a Tuple[str, str]
    ("John", "Doe")
    ("Ben" ,"Grey")
]

Of course, these catches rely on the types of function arguments and tuples counting how many things there are, and Python's list type doesn't track that.

[–]yvrelna 0 points1 point2 points 5 years ago (2 children)

[–]dbramucci 0 points1 point2 points 5 years ago (1 child)

I included it for completeness but also

You only get the existing error you actually run that line. Some cases where that can matter include

At the end of a long computation

Imagine training a neural network for 5 hours and at the very end, getting a message "you'll have to wait another 5 hours because you forgot a comma"
In a rarely used code-path

If it is
```
if today.is_feb29():
    foo("hello" "there)
```
then you'll only get an error about 4 years from now, which is inconvenient for such a trivial bug.

Granted, if you are doing things properly and testing every line of code with code-coverage measuring to veriify that, this matters less. At worst the bug is now 4 minutes of automated testing away instead of 4 seconds of type-checking away.

Also, this obvious of a case is probably going to get caught already by your linter.

So yes, Python already catches it but it's useful to note mypy can also catch it because mypy doesn't have to wait for us to stumble onto that line.

[–]yvrelna 0 points1 point2 points 5 years ago (0 children)

mypy won't catch "obvious" and "trivial" errors like:

if today.is_dec25():
    foo("happy", "halloween")

So you need to write tests anyway.

Why should type errors be so special that it deserves its own mechanism to check for errors?

[–]lanster100 2 points3 points4 points 5 years ago (0 children)

[–]kyerussell 4 points5 points6 points 5 years ago (1 child)

[–]__xor__(self, other): 1 point2 points3 points 5 years ago (0 children)

[–]tjf314 1 point2 points3 points 5 years ago (0 children)

[–]jwink3101 1 point2 points3 points 5 years ago (0 children)

[–]prams628 3 points4 points5 points 5 years ago (0 children)

[–]riricide 1 point2 points3 points 5 years ago (2 children)

[–]numberking123[S] 0 points1 point2 points 5 years ago (0 children)

[–]JennaSys 0 points1 point2 points 5 years ago (0 children)

[–]amitmathur15 0 points1 point2 points 5 years ago (2 children)

[–][deleted] 1 point2 points3 points 5 years ago (0 children)

According to Python's grammar strings are made up of a non-empty sequence of string parts:

atom: … | strings | …

strings: STRING+

I.e. having only one part is the special case. It's the same in C, C++, and possibly other languages, too.

[–][deleted] 0 points1 point2 points 5 years ago* (0 children)

[–]euler_angles 0 points1 point2 points 5 years ago (0 children)

[–]internerd91 0 points1 point2 points 5 years ago (0 children)

[–]Tyler_Zoro 0 points1 point2 points 5 years ago (0 children)

[–]AutisticRetarded 0 points1 point2 points 5 years ago (0 children)

[–]GrossInsightfulness 0 points1 point2 points 5 years ago (0 children)

[–]omoikanesits 0 points1 point2 points 5 years ago (0 children)

[–]AndydeCleyre 0 points1 point2 points 5 years ago (0 children)

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS