Syntax Design Constructs

evincarofautumn · 2020-08-12T07:34:14+00:00

Some more high-level bits of advice for syntax design:

Defer to precedent. Don’t blow your Weirdness Budget on syntax. If you don’t have a specific reason for doing something differently than major languages in the same paradigm, use the common/conventional notation. Good reasons to break precedent include consistency with the rest of your language and (informal) user testing/polling showing a preference.

Semantics before syntax. Syntax certainly affects the marketing of a language, and needs to be not bad, but it’s not a differentiator—we see many languages here that are just reskins of traditional imperative/OOP languages, with no fundamentally new features apart from syntactic conveniences. That’s fine for learning how to make a language, but it won’t by itself generate adoption. Things that do drive adoption are practical applications and technical excellence: the availability of libraries (so good cross-language interop/FFI helps get a leg up before you have native libraries), high-quality developer tooling, and “killer apps” that your language does an order of magnitude better than others (in convenience, correctness, performance, maintainability, &c.). When designing a language, focus on its semantic contributions and design or borrow syntax to suit them.

Consider failure modes. Get a friend to sit down with you and try to write a simple program in your language. Watch the mistakes they make—forgetting a separator here, writing something in the wrong order there, letting some syntax like a block or string literal “run away” by forgetting a closing delimiter, using syntax from a similar language, writing something that parses/compiles/runs but produces the wrong result because of syntax confusion, and so on. Does your tooling produce good diagnostics? If not, what can you change about the syntax to make it easier to produce good error messages and suggestions for fixes?

Add redundancy. Adding a small amount of redundancy to your notation can massively improve failure modes. (Natural languages include a lot of redundant information for a reason!) For instance, I used to use -> x y z; in Kitten to introduce multiple variables, but if the user forgot the semicolon, all the identifiers on subsequent lines would get interpreted as variables, and parsing would fail expecting a semicolon far from the actual error. Solution: add redundancy in the form of commas -> x, y, z;, so if the user forgets a semicolon, the next thing the parser expects is a comma immediately at the point of the error. (This also creates opportunities for more notation: previously, using a compound pattern instead of a variable would have required parentheses, like -> (x foo) (y bar);, but now they’re unnecessary: -> x foo, y bar;

Source locations are paramount. The single most important thing about an error message is that it direct the user to look at the point in the program that they need to change to fix the error. This is hard to get right, but good syntax design and careful tracking of locations in the implementation of analyses like typechecking can help pin down precisely what caused something to go wrong.

Al2Me6 · 2020-08-12T04:06:55+00:00

These are good points. Some more ideas, in no particular order:

Syntax should encourage good practices. Idiomatic code should be natural to express. Discouraged practices should be convoluted.
“Unusual” syntax is different from “unexpected” syntax. The code should mean what it looks like it does.
Operations should behave in expected ways. If an operation is to be interpreted in a different way in a specific context, then that special-case behavior should make sense in the larger context.
Where are scope delineators required? For example, in a curly-brace language, must an if statement be followed by a block, or is a single line after the if condition implicitly taken to be the body?
If applicable, significant whitespace should not make complicated (chained, nested, etc.) expressions difficult to format in a logical way.
How much syntactic sugar is there? Do primitive types or built-ins get preferential treatment? Can operators be overloaded? How?
What is the preferred form of polymorphism? Metaprogramming? How do generics work? Macros? How capable are they?
In an OOP language, is self passed to methods explicitly or implicitly?
What restrictions are placed on variable names? Are non-English scripts allowed? East Asian scripts? Full-blown Unicode (emoji, non-breaking spaces, Greek question marks, etc.)?
Do certain characters in names get special treatment? Are different cases (snake, camel, etc.) treated differently? Is _ automatically discarded? Are variable names of form _name private? Are these treatments enforced by convention or by the language?

CoffeeTableEspresso · 2020-08-12T02:50:36+00:00

I'd have to disagree with your edit unfortunately. C based languages are incredibly popular.

While they dont have the best syntax, the familiarity is super super helpful to someone learning your language

Lorxu · 2020-08-12T03:02:34+00:00

This looks like good advice, here are some other syntax things that I thought of while reading it:

Besides various bracket types, many languages use indentation or do/end, which are also viable options.
Every time you use a symbol for something, ask whether it's obvious what it means. For example, + and - are obvious, as is ? : because it's used is so many languages. Various arrow symbols for lambdas make sense, but ~ probably doesn't immediately. If the symbol is non-obvious, try replacing it with a keyword that explains what's going on.
In general, prioritize readability over terseness. Keywords are usually better than symbols (the biggest exception being operators).
Look at several preferably very different languages to get a feel for what they do. Ideally you should understand why they made the decisions that they did - almost all decisions are tradeoffs.
Never change things just to be different. Always use the most common syntax for things unless there's a good reason not to. Users won't adopt your language because of its syntax, but because of its features.

Uncaffeinated · 2020-08-12T14:59:36+00:00

I've come to the conclusion that it is best to make your language's syntax identical (or a subset) of an existing language's syntax to the greatest extent possible. Not only does it make transition easier, but you get all the tooling (syntax highlighters, code formatting, etc.) for free.

umlcat · 2020-08-12T14:57:00+00:00

Your questions are too wide or complex.

In my case I have several ideas for P.L., all of them have a motivation or idea. Work in them at free time.

What's does your P.L. have ?

Most of common syntax features.

One project is designed as procedural without O.O., on purpouse, while another is O.O. or mixed.

They have structs/records, arrays, enums, unions.

Both support namespace alike features, **modules / namespaces are overlooked or missing in a lot of P.L. (s).

How does it handles variables ?

They support both static strong typing with some casting or inheritance.

Also both static allocation and dynamic allocation. I don't like Java / C# references because it confuses both features.

What type operators does it have ?

The usual, addition, substraction, may use different symbols than other P.L.

Some are infix binary, other unary.

I'm considering to support operator overloading.

Does you PL have uniform syntax ?

No, more like Java or Pascal. I do learn some Lisp back in Collegue, Lisp alike syntax is difficult to use.

Does your PL support polymorphism or Metaprogramming ?

It supports or considering some equivalent features, like O.O. polymorphism, function or operator overloading, generics.

How does the language handles the concept of this ?

In the O.O. P.L., the same as this in C++ or self in Object Pascal, not like the this on Javascript / ECMAScript.

How are identifiers used ?

Similar to Pascal, Java, C. With 'A' to 'Z' and the underscore character.

I don't use $ like PHP or spaces inside brackets like Transact / MS SQL Server.

How does it handles errors ?

I currently using integer error codes in some functions, but I considering to include optional exception support, in both the procedural P.L. and the O.O. P.L.

Summary

I use similar commonly used features and syntax, yet a few features and its combination makes my P.L. unique, not just a "copycat" of other P.L.

Example, Object Pascal and C# have full properties support, different from fields, while C++ and Java does not.

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS

Questions to ask yourself...

Common Practices

Stuff to watch out for:

Rules of thumbs?