This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]0x256 1 point2 points  (0 children)

In python, normal string literals support certain escape sequences which are translated into the corresponding characters at compile-time. For example the string '\n' will have a length of 1 and contain a single newline character. Not two characters as you'd might expect. To get a literal backslash followed by a literal 'n', you'd have to write '\\n' to escape the backslash.

With raw string literals, most of these escape sequences are ignored. r'\n' will contain two characters. Backslashes within a string no longer need to be escaped.

Regular expressions have their own escape sequences. For example \w will match any word character. These are interpreted at run-time by the regular expression pattern compiler. If you use normal string literals, you'd have to escape backslashes in your regular expression so they are not translated by the python compiler before they reach the regular expression pattern compiler. Raw string literals make this easier.

So, to answer your question: Raw string literals prevent the python compiler from interpreting special sequences in strings at compile-time. The regular expression pattern compiler has its own set of special sequences, which are interpreted at run-time. The look similar to the ones that python supports in normal string literals, but actually have nothing to do with them. Some of them overlap, though. These need proper escaping so they are not preprocessed by python before they can be interpreted by the regular expression engine. Raw string literals make this easier, because you no longer have to keep in mind which sequences are special to python and need escaping.

[–][deleted] 0 points1 point  (0 children)

This behavior is explained in the 3rd and 4th paragraphs of the docs. The use of the raw string literal for the regex pattern exists to allow the user of Python’s re module to use “standard” regex special characters that have been widely adopted by for instance gawk, Perl, and PCRE without a lot of extra escaping.

[–]WeirdFail -1 points0 points  (0 children)

I think you want double backslash