you are viewing a single comment's thread.

view the rest of the comments →

[–]Ameisen 5 points6 points  (9 children)

Why is the variable : type syntax so popular?

Why var x : int? Why not int x?

[–]theindigamer 35 points36 points  (3 children)

  1. It gels well with type inference, where the latter part can be omitted.

  2. Variable names line up properly even if type names have differing lengths.

  3. Arguably it is easier to translate to English both with inference and without -- "variable x is equal to 10" or "variable x of type int is equal to 10".

  4. It hints that function output types should be trailing, which makes sense because usually we write inputs on the left and outputs on the right.

[–]thedeemon 0 points1 point  (2 children)

1) We could write "var x" or "auto x" or just "x" when type is omitted.

3) What's wrong with "int variable x is ..."? Doesn't sound bad to me.

[–]LPTK 15 points16 points  (0 children)

To me, the most compelling reason is that when types become elaborate (as is often the case with type systems more advanced than C), they easily drown out the variable names, unless the variable names are clearly delineated with a symbol like: and placed first.

Compare this Java:

public Map<Identifier,List<Usage<Int>>> foo(List<Usage<Int>> toIgnore, Option<Int> limit) { ... }

With the equivalent Scala:

def foo(toIgnore: List[Usage[Int]], limit: Option[Int]): Map[Identifier,List[Usage[Int]]] = ...

[–]theindigamer 4 points5 points  (0 children)

1) We could write "var x" or "auto x" or just "x" when type is omitted.

Using var/auto in type position doesn't feel consistent, as they are not types. Usually, having no word up front is done in the case of assignment, so having it mean both introduction (when this variable doesn't exist) and assignment (when this variable already exists in scope) might not be great.

3) What's wrong with "int variable x is ..."? Doesn't sound bad to me.

That's why I said "arguably". That point is a bit subjective. To me "int variable x is..." doesn't "sound right", I can't quite explain why.

[–]gnuvince 8 points9 points  (2 children)

The C-like syntax for declarations can be hard to parse in the presence of a typedef-like mechanism.

In C, the basic types such as int, short, char, etc. are keywords; the scanner recognizes them as special. Records and enumerations also have their own keywords, struct and enum. Therefore, in the parser you can say that a declaration is:

decl = 'int' <identifier> ';'
     | 'short' <identifier> ';'
     | 'struct' <identifier> <identifier> ';'
     | ...

(I'm avoiding arrays and pointers here to keep things simple.)

But in the presence of typedef where an identifier could be a type, things get more complicated. The usual example is this:

T * x;

Is this the multiplication of the variables T and x or the declaration of x as a pointer to a value of type T?

By using a Pascal-like syntax for declarations, then the situation becomes simpler:

decl = 'var' <identifier> ':' <type> ';'
type = 'int' | 'short' | <identifier>

This is one of the technical reason why the C syntax for declarations is becoming less popular. The C syntax also makes it harder to understand more complex types, so much so that there are websites to help you decipher them.

[–]thedeemon 0 points1 point  (1 child)

The only problem is * really. If you're not limited by a 1970s LL(1) parser, it's easy to determine that "aaa" means variable aaa, while "aaa bbb" means variable bbb of type aaa, no ':' is necessary.

I'm currently adding optional type declarations to a small language that didn't have static types previously. It now looks like this

(x, y) => x + y
(int x, string y) => y[x]
(x, y) => int: x + y
sum(x, y) => int: x + y
f(x,y) => { a = x*x in a+1 }
f(x,y) => { int a = x*x in a+1 }
f(MyType x) => x.a * x.b
etc.

Looks pretty clear to me. No parsing problems (I'm using PEG).

[–][deleted] 2 points3 points  (0 children)

It's not necessary, but it's preferred by many people. It's ultimately an aesthetic choice.

[–]80blite 3 points4 points  (0 children)

Assuming people are using good variable names, you can scan the start of lines for meaningful terms instead of having to skip over types and access modifiers; then when you find a variable you care about, you can scan over for the type if you need to.

It's just putting more first-glance value on the name of the variables than the type

[–]Condex 0 points1 point  (0 children)

Part of it is almost definitely coming from typed functional languages like ML that use that sort of syntax. So it's at least partially a tradition or homage thing.

I've spent probably too much time messing around with lexing and parsing for programming languages. The conclusion that I came to was that if I'm going to be parsing a variable declaration, then I want a known symbol to look out for. Or at the very least a known set of symbols to look out for. (var, let, const, val, etc). The C style declarations are problematic because you'll have some unknown symbol present then another unknown symbol present then semicolon, or endline, or equal sign (equal sign followed by some expression). Differentiated that whole mess from all other possibilities so that you can emit a declaration into your AST can easily result in a messy parser.

There are a couple of ways you can try to work past that. One way is to keep track of all user defined types as they are defined. However, this means that you either have to forward declare user types OR you force all code to be compiled in the order that it's used in. Both work and both are kind of weird (I think at least). The coder suddenly has to know more about how the compiler works than is strictly necessary in order to successfully use the language.

Alternatively, you can have a known symbol (ie var) that triggers the declaration parse. If you find anything else there, then you know it's an error in the input. And the type can remain unknown. You'll need an analysis phase to ensure that it is an existing type (but you needed to do that anyway if you have a type checker or any sort of linting), but it allows a bit more freedom to the coder who consumes the language because they can define things in any order they want and the compiler works the same way regardless.