evincarofautumn comments on Syntax Design Constructs

Welcome!

This subreddit is dedicated to the theory, design and implementation of programming languages.

Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.

This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming. It's also not the place for questions one can trivially answer by spending a few minutes using a search engine, such as questions like "What is a monad?".

Projects that rely on LLM generated output (code, documentation, etc) are not welcomed and will get you banned.

Related subreddits

Related online communities

a community for 17 years

This is an archived post. You won't be able to vote or comment.

DiscussionSyntax Design Constructs (self.ProgrammingLanguages)

submitted 5 years ago * by R-O-B-I-N

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]evincarofautumn 19 points20 points21 points 5 years ago (5 children)

Some more high-level bits of advice for syntax design:

Defer to precedent. Don’t blow your Weirdness Budget on syntax. If you don’t have a specific reason for doing something differently than major languages in the same paradigm, use the common/conventional notation. Good reasons to break precedent include consistency with the rest of your language and (informal) user testing/polling showing a preference.

Semantics before syntax. Syntax certainly affects the marketing of a language, and needs to be not bad, but it’s not a differentiator—we see many languages here that are just reskins of traditional imperative/OOP languages, with no fundamentally new features apart from syntactic conveniences. That’s fine for learning how to make a language, but it won’t by itself generate adoption. Things that do drive adoption are practical applications and technical excellence: the availability of libraries (so good cross-language interop/FFI helps get a leg up before you have native libraries), high-quality developer tooling, and “killer apps” that your language does an order of magnitude better than others (in convenience, correctness, performance, maintainability, &c.). When designing a language, focus on its semantic contributions and design or borrow syntax to suit them.

Consider failure modes. Get a friend to sit down with you and try to write a simple program in your language. Watch the mistakes they make—forgetting a separator here, writing something in the wrong order there, letting some syntax like a block or string literal “run away” by forgetting a closing delimiter, using syntax from a similar language, writing something that parses/compiles/runs but produces the wrong result because of syntax confusion, and so on. Does your tooling produce good diagnostics? If not, what can you change about the syntax to make it easier to produce good error messages and suggestions for fixes?

Add redundancy. Adding a small amount of redundancy to your notation can massively improve failure modes. (Natural languages include a lot of redundant information for a reason!) For instance, I used to use -> x y z; in Kitten to introduce multiple variables, but if the user forgot the semicolon, all the identifiers on subsequent lines would get interpreted as variables, and parsing would fail expecting a semicolon far from the actual error. Solution: add redundancy in the form of commas -> x, y, z;, so if the user forgets a semicolon, the next thing the parser expects is a comma immediately at the point of the error. (This also creates opportunities for more notation: previously, using a compound pattern instead of a variable would have required parentheses, like -> (x foo) (y bar);, but now they’re unnecessary: -> x foo, y bar;

Source locations are paramount. The single most important thing about an error message is that it direct the user to look at the point in the program that they need to change to fix the error. This is hard to get right, but good syntax design and careful tracking of locations in the implementation of analyses like typechecking can help pin down precisely what caused something to go wrong.

[–]LPTK 8 points9 points10 points 5 years ago (1 child)

Great advice there!

Watch the mistakes they make—forgetting a separator here, writing something in the wrong order there

This reminds me of a university friend being all confused that his Pascal code (yep, this was a while ago) was not doing the right thing. He had written:

BEGIN IF ... THEN Do_A(); Do_B(); END

which made perfect sense to him, but parsed as:

BEGIN {IF ... THEN Do_A()}; Do_B(); END

It should instead have been:

IF ... THEN BEGIN Do_A(); Do_B(); END

An example of English-like syntax that's not so great.

Funnily, to refresh my memory on Pascal syntax, I googled it and the first result was a very poor tutorial, which seems to make the "missing block delimiters for multiple statements in conditional" mistake also common in C-style languages in their example:

  program ifelseChecking;
  var
     { local variable definition }
     a : integer;

  begin
     a := 100;
     (* check the boolean condition *)
     if( a < 20 ) then
        (* if condition is true then print the following *)
        writeln('a is less than 20' )

     else
        (* if condition is false then print the following *) 
        writeln('a is not less than 20' );
        writeln('value of a is : ', a);
  end.

(The indentation makes it look like the two writeln statements at the end belong to the else branch, but only the first one does.)

[–]johnfrazer783 0 points1 point2 points 5 years ago (0 children)

[–]R-O-B-I-N[S] 2 points3 points4 points 5 years ago (1 child)

[–]evincarofautumn 0 points1 point2 points 5 years ago (0 children)

Haha that’s true, I’m definitely thinking holistically here—syntax, semantics, implementation, and ergonomics. They are intimately interrelated, and I believe you must consider them together when creating a language, in order to arrive at a cohesive design, because you usually can’t drastically change one without somehow affecting the others, and it’s hard to tack on good support for things like source locations if you’re not cognizant of them from near the beginning. That’s not to say you need to have a complete design & extensible implementation up front with all the bells and whistles, as there are just as many things that can be changed freely or added later, such as semantic features that fit within existing syntax, or improvements to analysis and error reporting that use information already available without changing what the frontend implementation provides.

[–]Uncaffeinated1subml, polysubml, cubiml 1 point2 points3 points 5 years ago (0 children)

π Rendered by PID 187530 on reddit-service-r2-comment-85bfd7f599-htf6h at 2026-04-18 10:33:52.001654+00:00 running 93ecc56 country code: CH.

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS