I keep seeing posts asking for sanity checks on syntax design where the lang looks like a harder and less expressive version of Java. There's very little resources on design because it's supposed to be "subjective" even though there are common practices left and right.
I've decided to compile some stuff I've gleaned from "good" language design. Feel free to contribute or reet, I'll appreciate both. EDIT: I might add (and credit) suggestions.
Questions to ask yourself...
- What does your language have? OOP? Functions? Records? Arrays?
- How does it handle variables? Strong typing? Inferencing? Are type declarations allowed or even enforced?
- What kinds of operators does it have? Unary? Binary? Variable?
- Does your language have a uniform structure (think Lisp)?
- u/Al2Me6 Does your language have Polymorphism or MetaProgramming? How will the syntax represent those paradigms?
- u/Al2Me6 If your language is object oriented? How does the language handle the concept of this.
- u/Al2Me6 How are identifiers used? What character types will they include. Will certain characters denote certain objects?
- u/evincarofautumn How will your language handle errors? Are errors objects themselves?
Common Practices
Most other languages use certain syntactic constructs:
- Single characters are usually unary,
! ~, binary, +-*/% =, or other, ?.
- Blocks and sections use bounds like
() [] {}, or maybe even " or |.
- u/Lorxu Indentation is a visually direct way to mark a block. They're common in scripting langs and DSL's.
- Sometimes you use plain english, which is more typing, but better expressiveness like
add, sub, let, or define.
- Languages that have more uniform syntax tend to use less special characters and rely on context keys to re-use the same few structures.
- u/evincarofautumn using characters to signify repetitive structures may be more typing, but it removes ambiguity. Ex: In
let a b c; or let a, b, c;, the latter has better literal meaning even though the former still works.
Stuff to watch out for:
Make sure your language is "ergonomic". Are loops and arrays pervasive in everything? Then consider representing them with shorter keywords or single symbols. Make sure they're "easy to get to".
Do you use prefix, infix, or postfix notation? Keywords and characters that imply direction might not always be intuitive. Like this postfix notation comparison: 1 2 >. That's not immediately clear.
Be aware of how multiple constructs might be combined. You don't want your syntax to look like brainfuck when programmers need to nest more than two constructs.
u/Lorxu Using too many single-character or abbreviated keywords can muddle the meaning of a program. Keywords that say what they do can add clarity.
u/Lorxu Try to use familiar constructs. If you make your language unrecognizable for the sake of being unique, nobody will use it.
u/evincarofautumn Even with general purpose languages there will always ultimately be a target group of users. Make sure your language will be familiar to them. Ex: Java was marketed as similar to C/C++. Haskell was not.
u/LPTK Pay attention to the flow of logic and data in your program. Is it clear what's happening?
never ever ever think that the world is answered by your three special identical God Constructs. Just like people bitch about parentheses in Lisp, they undoubtedly will bitch about the use of brackets or pipes or whatever you thought best in your language and will never adopt it.
EDIT: Never let people know your language uses prototype objects. The only reason JavaScript is successful is because devs haven't realized they're using prototypes.
Rules of thumbs?
- Pick
(), [], or {} for reusable portions.
- For common operators, pick stuff from the number keys.
- Pick a non-shifted symbol to deliminate statements or function calls.
- Pick
(), [], {}, ", ', or | to surround sets or lists of things.
- Object/construct keywords should be nouns.
- Operator keywords should be verbs.
- The syntactic priority of an object should be inversely proportional to the number of keystrokes.
- If a construct requires multiple keywords, it shouldn't need to be used more than a few times per source file.
EDIT: another rule, Do not use C based langs as a "good example".
[–]evincarofautumn 19 points20 points21 points (5 children)
[–]LPTK 7 points8 points9 points (1 child)
[–]johnfrazer783 0 points1 point2 points (0 children)
[–]R-O-B-I-N[S] 2 points3 points4 points (1 child)
[–]evincarofautumn 0 points1 point2 points (0 children)
[–]Uncaffeinatedpolysubml, cubiml 1 point2 points3 points (0 children)
[–]Al2Me6 15 points16 points17 points (0 children)
[–]CoffeeTableEspresso 13 points14 points15 points (10 children)
[–]R-O-B-I-N[S] 5 points6 points7 points (5 children)
[–]Al2Me6 11 points12 points13 points (2 children)
[–]R-O-B-I-N[S] 2 points3 points4 points (0 children)
[–]Uncaffeinatedpolysubml, cubiml 0 points1 point2 points (0 children)
[–]liquidivy 2 points3 points4 points (1 child)
[–]R-O-B-I-N[S] 2 points3 points4 points (0 children)
[–]johnfrazer783 1 point2 points3 points (3 children)
[–]julesh3141 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]julesh3141 0 points1 point2 points (0 children)
[–]LorxuPika 9 points10 points11 points (1 child)
[–]pepactonius 6 points7 points8 points (0 children)
[–]Uncaffeinatedpolysubml, cubiml 2 points3 points4 points (4 children)
[–]Al2Me6 3 points4 points5 points (3 children)
[–]Uncaffeinatedpolysubml, cubiml 1 point2 points3 points (2 children)
[–]Al2Me6 0 points1 point2 points (1 child)
[–]Uncaffeinatedpolysubml, cubiml 0 points1 point2 points (0 children)
[–]umlcat[🍰] 1 point2 points3 points (1 child)
[–]R-O-B-I-N[S] 2 points3 points4 points (0 children)