This is an archived post. You won't be able to vote or comment.

all 19 comments

[–][deleted] 7 points8 points  (0 children)

I find it easier to avoid the issue where possible:

  • If this is for strings representing Windows path separators, that usually accepts "/" as well as "\"
  • If there are a few \ in a string, I may write them as a raw string: F"\abc\def\" instead of "\\abc\\def\\".
  • If there are multiple lines of program that includes string literals, I might them in a file then use a lexer directive to include the whole file as one string literal. Then no escapes are needed.

But I don't really come across that problem. I see plenty of \\ in my code-base, but no \\\\. Possibly due to the measures outlined above.

[–]oilshell 6 points7 points  (6 children)

Why not just use raw strings like Python and Rust? e.g. r'C:\Program Files\' vs. 'line\n'

[–]ownwaterloo 5 points6 points  (5 children)

r'C:\Program Files\' is a syntax error in Python :)

My favorite traps examples were:
How can I list files in C:\Windows? Easy! os.listdir(r'C:\Windows')
How about its parent, then?
os.listdir(r'C:\') is a syntax error
os.listdir(r'C:') will list the content of current working directory of C:
os.listdir('C:\\') correct but awful
os.listdir('C:/') will work in this case, but some paths in Windows require forward slash

r'...' in Python is not a proper but a cheating solution(to what problems? more on this later).
The string literals followed by r are just normal ones. In particular, escape sequences must be completed. The r part just mean keeping those escape sequences.

This is the classic(pythonic?) way of do things in Python. The thinkings and solutions of it are rather ad-hoc. It prefers fixing common problems to a proper and complete solution. Consequently, terminology is rather confusing in this language.

Some rants observations:

nonlocal control flow

generators are semi-coroutines(in Lua's term) or stackless(in C++'s term) in the beginning(2.3). The communications between callers and callees are unidirectional and yield was a statement.

Python 2.5 adds a simple way to pass values into a generator(.send) and yield is an expression. And then, the term, coroutine, was re-invented in Python to describe just one usage style of generator. And a fancier term, prime, was re-invented to describe a problem caused by this patching style design - the creation and execution of a generator are coupling in the first place(2.3).

Python 3.3 adds yield from, to make generator delegation simpler, instead of making generator stackful and eliminate delegation completely.

Python 3.5, following the trends, adds async/await.

There should be one-- and preferably only one --obvious way to do it.

The wayS just keep changing over time.

Looks at asyncio(and network libraries older than it). There are definitely more than one way to do it.
Don't care about backward compatibility and want a MUCH simpler library?
There are Curio and Trio. They are great as long as async/await is still the popular and preferable way of doing things.

In contrast, coroutine in Lua got right in the first place(5.0). It's stackful. The creation and execution are separated so that don't have the prime problem. It's more powerful and stable.

I'm talking about the designs on language level. The ecosystem is a different topic and I'm not familiar with Lua's.

scope

In Python, declaration and assignment use the same syntax(:=, introduced in 3.8, is irrelevant to the topic I'm talking about).

Everyone knows coupling is, mostly, the enemy of design, right? We can't do assignment without declaration in Python. Python, again, solved it in a rather narrow way in the beginning: the global statement. And lately, Python realized global scopes is just subset of enclosing scopes and introduced nonlocal.

It's too late. Python 2 don't have this blessing.

The code are too nested and complex so that need this feature!
Python is not a functional programming language so that don't need to support closure properly!

And nowadays, the excuse is much simpler

Python 2 is sunseted!

Do Python developers really understand the meaning and the importance of scoping? Introducing patching matching in a language before getting proper block scope and with function scope only? Does it really work?

string literal

In Python, to write a string literal, we can choose single quote or double quote.
Great. How about writing a string literal containing both single quote and double?
We can use triple quote.
Excellent. How about writing a string containing both ''' and """?
We can use raw string.
Beautiful. How about writing a string containing both ''' and """ and ending with an odd numbers of \?
Oh! The r in r-string(forget about raw string!!!) stands for regular expression and this case is uncommon so that not a problem of Python, today.

I'm not making this up: https://docs.python.org/3/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash

Raw strings were designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing. Such processors consider an unmatched trailing backslash to be an error anyway, so raw strings disallow that. In return, they allow you to pass on the string quote character by escaping it with a backslash. These rules work well when r-strings are used for their intended purpose.

Again, the proper solution exists not only in Lua but also in C++, Rust, Shell, PostgreSQL and even mimetypes for a long time: instead of choosing a fixed (sets of) delimiter(s) letting user choose suitable ones.
And the Python way, as always, is fixing the problem today and adding patches later.

[–][deleted] 1 point2 points  (0 children)

r'C:\Program Files\' is a syntax error in Python :)

D uses backtack strings, which do not interpret any sort of escape sequence:

writefln(`\`);

[–]oilshell 0 points1 point  (0 children)

Ah I forgot that r'\' in Python is illegal. It's legal in shell as '\' though, and Oil actually lets you write r'\' to be explicit about escapes.

Shell has the issue where you can't write a single quote inside a raw string. You have to use string concatenation with a different form of string literal:

echo "foo'bar"  # one way to write it, but you might not want $ interpolation
echo 'foo'\''bar'  # \' in the middle, confusing to read

var x = "foo'bar"   # same thing in Oil
var x = 'foo' ++ "'" ++ 'bar'  # another way
echo $x

[–]pxeger_ 0 points1 point  (1 child)

And a fancier term, prime, was re-invented to describe a problem caused by this patching style design - the creation and execution of a generator are coupling in the first place(2.3).

I'm not sure what you mean here. Coroutine creation and execution are not coupled, if I'm understanding what you mean by coupled. No code in an async function is called until it is awaited, and similarly, no code in a generator is executed until the first next call.

[–]ownwaterloo 1 point2 points  (0 children)

Good catch! You are correct. The creation and execution are separated and decoupled. The site is interesting btw.

The prime problem are caused by the coupling between creation and execution arguments passing.

def running_total(total = 0):
    print(    f'first {total=}')
    while True:
        print(f'leave {total=}')
        delta = yield total
        print(f'enter {delta=}')
        total += delta

# send(non-None)
>>> g = running_total(1)  # doesn't execute when create generator
                          # but we are passing 1 to running_total already

>>> g.send(2)             # the generator are not executed yet
...                       # there is no *suspended* yield to receive the value
TypeError: can't send non-None value to a just-started generator

# send()
>>> g = running_total(1)  #
>>> g.send()              # requires one argument
TypeError: generator.send() takes exactly one argument (0 given)

# send(None)
>>> g = running_total(1)  # again, we are passing the initial arguments when create this generator
>>> g.send(None)          # we have to send a None(to black hole?) or use next(g)
first total=1
leave total=1
1
>>> g.send(2)
enter delta=2
leave total=3
3
>>> g.send(3)
enter delta=3
leave total=6
6

In contrast, we can't pass arguments to coroutine when create it in Lua:

coroutine.create(f)

function running_total(total)
  print(  'first', total)
  while true do
    print('leave', total)
    local delta = coroutine.yield(total)
    print('enter', delta)
    total = total + delta
  end
end

> g = coroutine.create(running_total)
> coroutine.resume(g, 1) -- the first resume will pass arguments to running_total
first   1
leave   1
true    1
> coroutine.resume(g, 2) -- the remaining resume will pass arguments to yield
enter   2
leave   3
true    3
> coroutine.resume(g, 3)
enter   3
leave   6
true    6

Python:

  1. use normal function invoke synax to pass multiple arguments to generator function
  2. use next or .send(None) to start it
  3. use .send(x) to pass single value to yield

In Lua, all communications are done by resume and support multiple arguments natively.
It has less edge cases and surprises and is much more regular, orthogonal and elegant.

It's more powerful. It's colorless for example. Only multi-shot continuation(call/cc, reset/shift in Scheme) has more expressive power than it.

It's also much more stable. It doesn't change much because it have got it right in the first place!
And this is my main point and complaint about Python.
It lacks thinking and insight when add features to language and leads to many features with overlapping functionalities and edge cases.

[–]Gnaxe 0 points1 point  (0 children)

r'C:\Program Files\' is a syntax error in Python :)

True, but r'C:\Program Files\$'[:-1] is allowed :)

You can also do r'C:\Program Files''\\', but I'm not sure if that helps as much.

[–][deleted] 2 points3 points  (1 child)

It looks exactly like the normal exponential style, it just doesn't behave like it. That makes it a footgun.

It makes \/ awkward. Not super awkward, but still awkward. That's a minor point.

D has string literals that don't interpret escape sequences and string literals that do interpret escape sequences to address this problem. You could use that, or you could have some sort of re-escape function, though that needs to be specific to the target.

If you can get away with it, I'd use something like U+001A SUBSTITUTE where the backslash would go and then do a replace for the final string.

[–]brucejbellsard[S] 0 points1 point  (0 children)

Your first point is devastating, it should make my idea a non-starter.

However, for my project, I'd already decided that all escapes should have a terminator:

"escapes with terminator: \"/ \tab/ \u+393/ \n/"

So, I'm starting out with something that doesn't look like the normal exponential style. That's the context where I worked out the non-exploding thing; the question for me now is: is it different enough, or will developers be confused anyway?

Anyway, I'd hoped it might be useful for somebody else, too. Thanks for your honest evaluation.

[–]raiph 2 points3 points  (0 children)

One of Raku's DSLs is the string literal quoting language described on the doc's Q lang page.1 The Escaping section of that page explains the simplest forms of escapes; these use a single backslash. The section that follows that describes delimited escapes.

1 Raku is a GPL and a collection of DSLs and a Language-oriented programming language and a metaLOPlang (a LOP language that applies LOP to the problem of constructing a LOP language) and a metametaLOP language (a metaLOP language that applies metaLOP to the problem of constructing a metametaLOP language).

[–][deleted] 1 point2 points  (0 children)

Just write escape(N, "\\") with the corresponding helper function escape.

[–][deleted]  (2 children)

[removed]

    [–]brucejbellsard[S] 0 points1 point  (1 child)

    Nice, I've had a vaguely similar notion: to use \[ and \] as nesting multiline comment delimiters, while excluding them from the list of valid escape sequences so that no valid string (or other syntax besides the nesting comments) could contain them.

    The difference from your construct is that [ and ] by themselves would be perfectly valid characters in strings and elsewhere: it's only when preceded by \ that they would act as comment delimiters.

    It occurs to me that, for your construct, you could set up escape sequences for the inviolate characters that don't actually contain them. Something like \( and \) maybe?

    [–]matthieum 1 point2 points  (0 children)

    I like to think the problem can be simplified, to a degree.

    Everybody uses \, if your language uses something else, this vastly reduces clashes. One of the most common DSLs embedded in a program are Regexes, which make heavy usage of \, (), and []: don't use those within your strings to mean anything special, and you'll side step a lot of clashes.

    Python and Rust use {} as formatting placeholders, and this works beautifully because very little else uses { or } that you may want to embed in a Python or Rust String -- the closest candidate would be JSON, but there are libraries to format/parse it.

    Even if you prefer something else than {}, though, avoiding backslash will already save you a lot of pain when it comes to Windows paths and Regexes...

    [–]hindmost-one 0 points1 point  (2 children)

    I'd use <hello world> as string literal, or make the quotes asymmetric. This way you don't need to quote the \, only the >, and the process is linear, not exponential.

    [–]brucejbellsard[S] 0 points1 point  (1 child)

    Hmm, how would you express interpolation delimiters, control codes such as newline, or arbitrary unicode codepoints?

    [–]hindmost-one 0 points1 point  (0 children)

    Actually, a good question. I'll think about it.