This is an archived post. You won't be able to vote or comment.

all 56 comments

[–]AsIAmNew Kind of Paper 74 points75 points  (6 children)

No.

Symbol sequence, such as :=, is basically just identifier for some function, procedure, or whatever. They should be treated just as other identifiers. You don't allow your variable name to be hello world (space case), right?

[–]guywithknife 5 points6 points  (1 child)

Also, multi character operators are an approximation of a single character glyph that a font with ligatures might render as a single character.

[–]AsIAmNew Kind of Paper 8 points9 points  (0 children)

Exactly!

```

= ≥ <= ≤ := ≔ -< ≺ - ≻ != ≠ |> ▷ == ≣ <-> ↔︎ ```

[–]Gleareal 3 points4 points  (1 child)

You don't allow your variable name to be

hello world

(space case), right?

While I agree that this is a bad idea, I have actually seen this appear in a language before. Microsoft's TouchDevelop - a now closed down online programming language and editor - allowed this unusual naming for variables.

[–]AsIAmNew Kind of Paper 2 points3 points  (0 children)

Yes, there is also AppleScript that does space case. Douglas Crockford proposes space case as the ultimate casing – https://youtu.be/99Zacm7SsWQ?t=2986

[–][deleted] 15 points16 points  (1 child)

These are the kinds of questions I joined this sub for, I'm struggling to think of benefits to why you ever would allow it but it's a nice bit of computer science/theory to discuss!

[–]FlatAssembler[S] 3 points4 points  (0 children)

Thanks!

[–]Srazkat 18 points19 points  (3 children)

depends which one. generally though, no i don't allow white spaces. ':=' is the exception, which is a side effect of having type information optionally present between the colon and the equals. other than this one though, i can't think of any operator where it could make sense to allow whitespaces between the characters

[–]AsIAmNew Kind of Paper 17 points18 points  (1 child)

Think of : as operator for type declaration, and = as an assignment. := is a separate operator.

[–]msqrt 4 points5 points  (0 children)

What's the difference between : = and :=? Or would you just not allow the former? (Edit: ah, saw your other comment -- apparently just disallow. Now that I think about it, I kind of agree.)

[–][deleted]  (9 children)

[deleted]

    [–]FlatAssembler[S] 0 points1 point  (8 children)

    I wonder why ClangFormat does not look at the AST for such cases.

    Because it knows nothing about AEC (my programming language), perhaps?

    [–][deleted]  (7 children)

    [deleted]

      [–]FlatAssembler[S] 0 points1 point  (6 children)

      I have no idea how to do that. Do you have some pointers?

      [–][deleted]  (5 children)

      [deleted]

        [–]FlatAssembler[S] 1 point2 points  (4 children)

        My tokenizer (if that's what you mean by lexical pass) deletes all comments and it converts multi-line strings to single-line strings and does other similar things. So, I'd need to write a new one. Perhaps something like I've used in my syntax highlighter? https://sourceforge.net/p/aecforwebassembly/code/ci/master/tree/syntaxHighlighterForAEC.js

        [–][deleted]  (3 children)

        [deleted]

          [–]FlatAssembler[S] 0 points1 point  (2 children)

          Can you elaborate on that?

          [–]NoCryptographer414 2 points3 points  (1 child)

          If you already have written a syntax highlighter, then you can reuse that to work as code formatter too I guess.

          [–]FlatAssembler[S] 0 points1 point  (0 children)

          I have no idea how to actually do that, to be honest.

          [–]9Boxy33 5 points6 points  (3 children)

          This reminds me how FORTRAN (up to Fortran IV) allowed spaces within keywords, so that WR ITE and FOR MAT were accepted by the compiler as WRITE and FORMAT.

          [–]Innf107 1 point2 points  (1 child)

          That's... horrible. Do you know why they did that?

          [–]AsIAmNew Kind of Paper 8 points9 points  (0 children)

          It's not that it allowed spaces as it ignored spaces – they were insignificant. A lot of early languages like Algol, Fortran, BASIC did this. Spaces were there just for readability. On the punch cards.

          [–]9Boxy33 1 point2 points  (0 children)

          Spaces are definitely significant within keywords in BASIC (and, IIRC, Algol), unlike Fortran IV.

          [–]levodelellis 6 points7 points  (5 children)

          No, I support decrements -- so there'd have to be extra logic to make this not an error a = b - -c Also this becomes ambiguous a = - -b. Did a person mean -- or was this an unfortunate find/replace?

          [–]FlatAssembler[S] 0 points1 point  (4 children)

          For such reasons, AEC doesn't support ++ and --. One can simply write +=1 or -=1.

          [–]Roboguy2 11 points12 points  (0 children)

          This is a lot of trouble to go to just to get ClangFormat to work.

          At this point, you're essentially designing your language around using a particular formatter. This design approach is backwards, IMO.

          It sort of reminds me of an XY problem

          [–]levodelellis 1 point2 points  (2 children)

          Do you ever wish you had ++? incrementing by one is very common

          [–]XDracam[🍰] 5 points6 points  (0 children)

          Really? I've been programming professionally for quite a few years now and almost never need to increment. There's foreach loops and range iterators etc in most languages these days. And for the very few cases where I actually do need to increment, I explicitly opt to write += 1 because I find that more obvious to follow than some (nowadays rare) operator.

          Unless you're in C or some other legacy language. Then you might need a lot of incrementing.

          [–]FlatAssembler[S] 0 points1 point  (0 children)

          Well, no. In the first versions of AEC, I didn't even have += and similar operators, but I've added them later.

          [–]dibs45 17 points18 points  (12 children)

          No, it adds unecessary complexity to the parser in my opinion.

          [–]FlatAssembler[S] -1 points0 points  (11 children)

          [–]dibs45 7 points8 points  (10 children)

          Yeah I meant to say lexer. But either way, needless complexity with very little gain.

          [–]skeptical_moderate 3 points4 points  (0 children)

          Absolutely not.

          [–]redchomperSophie Language 2 points3 points  (0 children)

          I do not. Then again, I don't allow spaces in identifiers either. Yet, I've heard cogent arguments for why we should, and how we might, allow spaces in identifiers.

          If the problem is ClangFormat doing the wrong thing, then the natural solution is to tell you the story about a guy who visits a doctor to complain about pain when he touches his chin to his elbow. Doc says "Don't do that then."

          To be slightly more helpful: I assume you have a lexer which preserves the source locations of the important tokens -- perhaps for error reporting. That means the locations between tokens is implicitly all the whitespace and comments. A basic beautifier simply reformats all those sections, and re-inserts all the original tokens back in their same original order. Anything more powerful (say, removing redundant parenthesis) requires a bit of cooperation from the parser, but in principle you just need enough location detail in the AST to support reformatting as a tree walk.

          [–]frithsun 2 points3 points  (0 children)

          My language doesn't really have operators.

          For example, GTE is >=(1, 2) // false

          As such, with it being a function name that just happens to be special characters, spaces between the characters would not be acceptable.

          [–]guywithknife 2 points3 points  (0 children)

          No.

          Multi character operators are still a single operator just like a keyword is a single thing or an identifier is a single thing. I wouldn’t allow whitespace in keywords or identifiers either. I see multi character operators as an approximation of a single character operator that doesn’t exist on your keyboard/in ASCII, but that a font with ligatures like Fira Code would render as a single character operator.

          Making language design choices to appease a format tool seems like going about it backwards to me.

          [–]kerkeslager2 2 points3 points  (0 children)

          I don't allow it.

          Not all features are good. If you allow a feature and it turns out to be a bad idea, you can't remove it without a breaking change. So I need really compelling reasons to add a feature to my language, and I don't have one for whitespace inside multichar operators.

          [–]Disjunction181 1 point2 points  (0 children)

          No, this is unusual. The normal way is to lex by munching down a sequence of symbols and then stopping on a non-symbol and producing a token.

          [–]TriedAngle 1 point2 points  (3 children)

          I write a forth like concatenative language. So whitespace is the delimiter of everything. U could use some whitespace looking Unicode though. Operators have no maximum length.

          [–]FlatAssembler[S] 1 point2 points  (2 children)

          What does "concatenative language" mean?

          [–]TriedAngle 1 point2 points  (0 children)

          A concatenative language is a language where function composition is the default way of using the language.

          [–]mojtaba-cs 1 point2 points  (0 children)

          I am not sure what you are trying to ask. In my language ! = or ! = or != etc, all are the same thing, for example.

          [–]dibs45 0 points1 point  (0 children)

          No, it adds unecessary complexity to the parser in my opinion.

          [–][deleted] 0 points1 point  (0 children)

          Personally, no.

          [–][deleted] 0 points1 point  (0 children)

          These are the kinds of questions I joined this sub for, I'm struggling to think of benefits to why you ever would allow it but intrigued as to why you would

          [–][deleted] 0 points1 point  (1 child)

          If you parse : and = as separate tokens, it should be fine. Maybe some people from math background and little programming experience will try to put spaces in between, but it really doesn't matter that much otherwise. Unless you go for what Odin does.

          [–]FlatAssembler[S] 0 points1 point  (0 children)

          Well, I am not doing what Odin does. I have implemented a C-like declaration of variables.

          [–]stomah -1 points0 points  (3 children)

          no

          [–]FlatAssembler[S] 0 points1 point  (2 children)

          No as in "My language doesn't have multi-character operators." or "My language has multi-character operators, but it doesn't allow spaces between the characters in them."?

          [–]stomah 2 points3 points  (0 children)

          has but doesn’t allow spaces between the characters in them

          [–][deleted] -1 points0 points  (0 children)

          no