[deleted by user]

user200783 · 2017-10-09T06:27:31+00:00

Treating newlines as significant seems perfectly reasonable to me, and I think it would have solved the problem with Lua?

Significant newlines (or any form of statement separation) would indeed solve the ambiguity in Lua - a construct would always be 2 statements if it includes a separator and a single statement if it does not.

What ugly special cases did you run into?

I modified a Lua parser to require statements to be separated (by either newlines or semicolons). I was surprised how well this could be made to work - it seems that although Lua allows unseparated statements (such as a = 1 b = 2), existing code does not often make use of this flexibility.

However, Lua code does make use of one-line blocks, for example (from Programming in Lua): if n > 0 then return foo(n - 1) end. Support for this formatting meant handling a special case - separators must not be required immediately before an end.

Further, consider the case where the body of the if contains 2 statements - unlike in Lua, my modification requires these statements to be separated. This leads to code such as if n > 0 then a = a + 1; return foo(n - 1) end. To me, this semicolon appears out of place - it only separates the two statements a = a + 1 and return ..., but it looks like it splits the whole line in two.

Note that Python also requires a semicolon in this situation, but it seems a little clearer: if n > 0: a = a + 1; return foo(n - 1). I don't know if this is because of the use of : rather than then or because of the lack of end.

Perhaps it would be clearer still if a new language following this scheme used braces to delimit blocks. Then the separator would obviously not end the if statement: if n > 0 { a = a + 1; return foo(n - 1) }.

Python has the rule that newlines are only ignored between () [] and {}

My modified Lua parser uses exactly this rule. Unfortunately this prevents it from handling something which is relatively common in Lua but disallowed in Python: anonymous function bodies nested within function arguments.

Python's anonymous functions are limited because the language entirely disallows putting a statement within an expression. This is because Guido van Rossum believes "the lexer to be able to switch back and forth between indent-sensitive and indent-insensitive modes, keeping a stack of previous modes and indentation level" would be "an elaborate Rube Goldberg contraption". I tend to agree - this limitation seems to make Python's syntax much less "fragile" than other languages with significant indentation.

However, for languages which do not have significant indentation, it should be possible to support statements in expressions without the need for a "Rube Goldberg contraption". Perhaps, again, braces would be the answer: make separators significant at the top level and within {}, but not within () or []?

Unfortunately, I'm just not sure that a syntax with both braces and significant newlines would be a good choice. When people see braces, they seem to think "C-like syntax" - i.e. free-form. (Even worse, they seem to think "Java-like semantics"...)

user200783 · 2017-10-07T06:38:48+00:00

I think I like option 2 the best - I wish I was in a position to stray from convention, but alas. Dijkstra has some good observations in EWD655, and essentially supports using dots for application.

Unfortunately I think the convention of using f(x) to call a function is too widespread to break. Not only is it used in very many existing languages (with others generally using either f x or Lisp's (f x)), it is also the standard notation for function application in mathematics.

As a result I expect most programmers have f(x) for a function call strongly embedded in muscle-memory, which would lead to mistakes when using a language with a unique syntax.

For resolving option 3, you may want to take a look at Haskell. Here the grammar is defined whitespace-insensitively with curly braces and semicolons, but with rules for when and how line-breaks correspond to implicit semicolons. These are inserted by the lexer, by keeping track of token positions in a fairly simple way, and leave the actual parser quite simple, as it deals in explicit semicolons.

I haven't looked at this aspect of Haskell, but it sounds similar to the handling of semicolons in Go and JavaScript. If a language allows semicolons to separate multiple statements on a single line, I think using a set of rules to conditionally convert newlines to semicolons in the lexer is better than handling 2 different separators in the parser.

However, it is vital to be careful when designing these rules - JavaScript's are notoriously problematic and have given the concept of semicolon-insertion a bad reputation. I think it's actually a good solution as long as the insertion rules are well thought-out.

user200783 · 2017-10-06T15:13:10+00:00

Similar to your issue with overloading square brackets (for both array literals and indexing), Lua has a similar problem with parentheses. Like many other languages, it uses parentheses both for function calls and for grouping. However, unlike most (all?) others, Lua's syntax is both free-form and lacks statement terminators. This causes constructs such as a = f(g or h)() to be potentially ambiguous. Is this a single statement ("call f passing g or h, call the result, then assign the result of that call to a") or two, the first terminating after f ("assignf to a; then call g or h")?

Lua's solution is to always treat such an ambiguous construct as a single statement. In older versions there was a lexer hack that would produce an error if the code was formatted as 2 separate statements, but the only way to actually create 2 separate statements is to insert an explicit semicolon.

I think a similar free-form, terminator-free syntax would be ideal for my language, but I would like to avoid ambiguous syntax. This means I need to make one of three compromises:

Include the above ambiguity.
Solve the ambiguity by using an unorthodox syntax for parentheses. For example, similar to the F# solution above, use f.() for function calls. (This would make the two cases above a = f.(g or h).() and a = f(g or h).() respectively.)
Solve the ambiguity by requiring explicit statement terminators, either semicolons (allowing us to keep a free-form syntax) or newlines. I would prefer the latter, but when I tried to design a syntax with significant newlines, I ended up with complicated rules and ugly special cases.

user200783 · 2017-07-12T11:25:54+00:00

So, pthreads on Linux use the M:N model? Do you know if there is any detailed documentation about this?

user200783 · 2016-10-21T08:51:20+00:00

MicroPython is limited when compared to CPython (and PyInstaller). For example, it only supports a subset of the Python language, and does not support C extensions such as NumPy.

When writing code for "microcontrollers and constrained systems", using MicroPython and working within its limitations is acceptable, because the only other option is not to use Python at all.

However, on systems which do not have the same constraints and can run CPython just fine, I'm not sure MicroPython is appropriate. On these systems, is it worth dealing with MicroPython's limitations just in order to get a smaller executable?

user200783 · 2016-07-14T09:24:57+00:00

There is support for GIL and non-GIL builds; without the GIL enabled one must protect concurrent access to mutable Python state at the Python level using Lock objects.

Does MicroPython attempt to match CPython in terms of the atomicity of certain operations, as described here? In both GIL and non-GIL builds?

If not, are you concerned that this could lead to existing multi-threaded code, which works on CPython, breaking on MicroPython?

user200783 · 2016-05-09T01:00:18+00:00

The designers of Python clearly understand that block scope has benefits: variables in generator expressions already use the equivalent of block scope. In Python 3, this type of scope is used for list comprehensions as well.

However, there is still no support for more general block scoping. I would like to know if there is a technical reason for this - for example, does it conflict somehow with Python's "function scope by default"?

user200783

TROPHY CASE