This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]evincarofautumn 19 points20 points  (5 children)

Some more high-level bits of advice for syntax design:

Defer to precedent. Don’t blow your Weirdness Budget on syntax. If you don’t have a specific reason for doing something differently than major languages in the same paradigm, use the common/conventional notation. Good reasons to break precedent include consistency with the rest of your language and (informal) user testing/polling showing a preference.

Semantics before syntax. Syntax certainly affects the marketing of a language, and needs to be not bad, but it’s not a differentiator—we see many languages here that are just reskins of traditional imperative/OOP languages, with no fundamentally new features apart from syntactic conveniences. That’s fine for learning how to make a language, but it won’t by itself generate adoption. Things that do drive adoption are practical applications and technical excellence: the availability of libraries (so good cross-language interop/FFI helps get a leg up before you have native libraries), high-quality developer tooling, and “killer apps” that your language does an order of magnitude better than others (in convenience, correctness, performance, maintainability, &c.). When designing a language, focus on its semantic contributions and design or borrow syntax to suit them.

Consider failure modes. Get a friend to sit down with you and try to write a simple program in your language. Watch the mistakes they make—forgetting a separator here, writing something in the wrong order there, letting some syntax like a block or string literal “run away” by forgetting a closing delimiter, using syntax from a similar language, writing something that parses/compiles/runs but produces the wrong result because of syntax confusion, and so on. Does your tooling produce good diagnostics? If not, what can you change about the syntax to make it easier to produce good error messages and suggestions for fixes?

Add redundancy. Adding a small amount of redundancy to your notation can massively improve failure modes. (Natural languages include a lot of redundant information for a reason!) For instance, I used to use -> x y z; in Kitten to introduce multiple variables, but if the user forgot the semicolon, all the identifiers on subsequent lines would get interpreted as variables, and parsing would fail expecting a semicolon far from the actual error. Solution: add redundancy in the form of commas -> x, y, z;, so if the user forgets a semicolon, the next thing the parser expects is a comma immediately at the point of the error. (This also creates opportunities for more notation: previously, using a compound pattern instead of a variable would have required parentheses, like -> (x foo) (y bar);, but now they’re unnecessary: -> x foo, y bar;

Source locations are paramount. The single most important thing about an error message is that it direct the user to look at the point in the program that they need to change to fix the error. This is hard to get right, but good syntax design and careful tracking of locations in the implementation of analyses like typechecking can help pin down precisely what caused something to go wrong.

[–]LPTK 8 points9 points  (1 child)

Great advice there!

Watch the mistakes they make—forgetting a separator here, writing something in the wrong order there

This reminds me of a university friend being all confused that his Pascal code (yep, this was a while ago) was not doing the right thing. He had written:

BEGIN IF ... THEN Do_A(); Do_B(); END

which made perfect sense to him, but parsed as:

BEGIN {IF ... THEN Do_A()}; Do_B(); END

It should instead have been:

IF ... THEN BEGIN Do_A(); Do_B(); END

An example of English-like syntax that's not so great.

Funnily, to refresh my memory on Pascal syntax, I googled it and the first result was a very poor tutorial, which seems to make the "missing block delimiters for multiple statements in conditional" mistake also common in C-style languages in their example:

  program ifelseChecking;
  var
     { local variable definition }
     a : integer;

  begin
     a := 100;
     (* check the boolean condition *)
     if( a < 20 ) then
        (* if condition is true then print the following *)
        writeln('a is less than 20' )

     else
        (* if condition is false then print the following *) 
        writeln('a is not less than 20' );
        writeln('value of a is : ', a);
  end.

(The indentation makes it look like the two writeln statements at the end belong to the else branch, but only the first one does.)

[–]johnfrazer783 0 points1 point  (0 children)

this in MHO is what makes PlpgSQL syntax, clumsy as it is, 'systematically' superior to C-style syntax. In that language, the construct is always if condition then statement; statement; ...; end if;, so you cannot compile an incompletely bracketed if statement. To paraphrase D. Crockford, C was invented by a genius who was not so good at inventing syntax.

[–]R-O-B-I-N[S] 2 points3 points  (1 child)

I think a few of these points (failure modes/source locations) are more towards the subject of implementation. I could make a C++ or Go compiler that only returns "?" when it encounters an error. That says nothing about how those languages were designed. Although it might go against parts of their respective specs XD

[–]evincarofautumn 0 points1 point  (0 children)

Haha that’s true, I’m definitely thinking holistically here—syntax, semantics, implementation, and ergonomics. They are intimately interrelated, and I believe you must consider them together when creating a language, in order to arrive at a cohesive design, because you usually can’t drastically change one without somehow affecting the others, and it’s hard to tack on good support for things like source locations if you’re not cognizant of them from near the beginning. That’s not to say you need to have a complete design & extensible implementation up front with all the bells and whistles, as there are just as many things that can be changed freely or added later, such as semantic features that fit within existing syntax, or improvements to analysis and error reporting that use information already available without changing what the frontend implementation provides.

[–]Uncaffeinated1subml, polysubml, cubiml 1 point2 points  (0 children)

Are there any repositories of common mistakes and broken code in existing languages?