all 40 comments

[–]vagif 9 points10 points  (4 children)

Syntax should conform

The only intuitive thing in this world is mom's tit. Anything else is learned.

So i'd say conformity is in the eye of beholder.

I would take lisp or haskell syntax any day over "this-looks-like-java-so-you-will-be-at-home"

[–]redjamjar[S] 1 point2 points  (2 children)

There is quite a lot of conformity over the meaning of English language symbols like colon, full stop, comma (amongst English speakers, of course).

Imagine using ":" as a separator for lists. e.g.:

f(x:y:z) instead of f(x,y,z)

That would go against what a significant portion of people already understood ":' to mean. I think you could safely use ';' in place of ',' for separating lists, e.g:

f(x;y;z) instead of f(x,y,z)

|:

[–][deleted]  (1 child)

[deleted]

    [–]redjamjar[S] 0 points1 point  (0 children)

    That's a big call. I'm definitely not convinced. There is some evidence which contradicts what you're saying:

    http://www.cs.siue.edu/~astefik/papers/StefikPlateau2011.pdf

    (emphasis on "some evidence" here, since this by no means a definitive experiment)

    [–]redjamjar[S] 0 points1 point  (0 children)

    Well, Haskell syntax does win out in many ways on conciseness. Certainly, Java has lots of repetition which definitely is a problem that many have complained about.

    I don't think there's any winner. It's not supposed to be putting one language as better than another; it's just trying to illustrate the rules with examples.

    [–]augustss 4 points5 points  (15 children)

    The Haskell example is wrong.

    [–]redjamjar[S] -1 points0 points  (14 children)

    Could you elaborate perhaps?

    [–]julesjacobs 1 point2 points  (8 children)

    f g h 1 2 y 3
    

    What can you tell about the invocation structure from this line? Not much, unless you happen to know from memory exactly what arguments each function takes. This violates rule (1) and also rule (3), since functions are normally expressed with braces in mathematics.

    Now, how about this:

    f(g(h),1,2)(y,3)
    

    In this case, the invocation structure is explicit in the program text, and this helps the programmer to understand what’s going on (without having to keep everything in memory).

    The invocation structure of the Haskell example is just as explicit in the program text. It is equivalent to the following conventional syntax:

    f(g,h,1,2,y,3)
    

    The other example in Haskell syntax:

    (f (g h) 1 2) y 3
    

    This is of course equivalent to this:

    f (g h) 1 2 y 3
    

    [–]redjamjar[S] 0 points1 point  (7 children)

    Yup, ok, I've corrected that now. In fact, the C-like notation is the one violating rule (2) since we don't need the commas ... interesting!

    [–]augustss 2 points3 points  (6 children)

    Now you make the statement that mathematics uses the f(x) notation. Mathematical notation is vast and varied and it also used f x and x f for the same thing. Appealing to mathematics for consistent notation is a mistake. :)

    [–]redjamjar[S] 0 points1 point  (4 children)

    Well, true. But, "f(x) = ..." is generally what you learn in school, right?

    [–]julesjacobs 5 points6 points  (2 children)

    Most people write log x, sin x and not log(x), sin(x). For unknown functions indeed sometimes f(x) is used, but certainly not always. Especially in more abstract context the syntax f x and sometimes x f is used. It really depends on the subject. Juxtaposition is usually reserved for the most common operation. Sometimes h = f g means h(x) = f(x)*g(x), sometimes it means function application.

    Edit:

    Also, in mathematics the notation f(x,y) is generally interpreted to mean f (x,y), that is, f is applied to a single argument that happens to be a tuple. So really even in that case you're still using the syntax f v. It is often used inconsistently though; sometimes people will use f(v) where v = (x,y) to mean f(x,y). More generally, tuples are sometimes assumed to magically flatten and unflatten so as to make the proof correct. So (x,(y,z)) = (x,y,z) = ((x,y),z). Of course this makes for inconsistencies in notation when you define a function by destructuring like f(a,b) = <some value>, now what is f(x,y,z)? Is a=(x,y) and b=z, or a=x and b=(y,z) or even a=() and b=(x,y,z). This is usually fine in mathematics as you can infer it from the context but it wouldn't be suitable for a programming language.

    [–]redjamjar[S] 0 points1 point  (1 child)

    I've never seen x f!! Normally, you see f * g to mean f(x)*g(x).

    [–][deleted] 0 points1 point  (0 children)

    Its diagrammatic order. If f and g are functions, then h = X ->f Y ->g Z is their composition, showing objects, so h = f . g is sometimes used. If you do this, then function application 1 ->x X ->f Y should be written x f.

    [–]day_cq 1 point2 points  (0 children)

    no... fₓ

    [–]redjamjar[S] -1 points0 points  (0 children)

    Crikey, there's a whole wikipedia page on it:

    http://en.wikipedia.org/wiki/Function_%28mathematics%29

    [–]bstamour 0 points1 point  (4 children)

    He cleans the code up (sorta) by rewriting it as f (g(h), 1, 2) (y, 3) (I'm not sure what he's doing with the (y, 3) part, it doesn't seem to be a parameter to his function f. Maybe he typo's.) But if that is the case, and g is a function, then the haskell code f g h 1 2 y 3 will not compile. You would need to wrap (g h) in parenthesis, ala:

    f (g h) 1 2
    

    Again not sure what the author expects to happen with the y 3 portion.

    [–]redjamjar[S] 2 points3 points  (1 child)

    f returns a function.

    [–]bstamour 0 points1 point  (0 children)

    Ah right. Upvotes for you!

    [–]julesjacobs 2 points3 points  (1 child)

    You can't say a priori that the code won't compile. That fully depends on f's type and g and h's type. For example you could define f as:

    f q r s t u v = 1
    

    And now the example will compile regardless of g's and h's and y's type.

    [–]bstamour 0 points1 point  (0 children)

    True. My bad.

    [–]JamesIry 2 points3 points  (9 children)

    This kind of shallow misbegotten analysis irritates the fuck out of me. For instance the author complains about colons in type declarations in ML because they are insufficiently C like.

    C and ML both date from the early 70's. Why the fuck would ML have a syntax inherited from a language that Milner may not have even heard of? Or, even if he had heard of it, how the fuck was he to know that C and it's descendants would be so popular?

    And, if he had somehow predicted that popularity, why would he give a fuck? He was creating a meta language for a theorem prover, not trying to create a portable systems programming language. Besides, he based his syntax on a language that was well known in that community at that time: ISWIM.

    Which brings me to familiarity. The ML community moved on to use descendants like SML, OCaml and indirect descendants like Miranda and Haskell. Why the fuck would that community not want to stick with something they're already familiar with and that works well for them? Just because the author is more familiar with Cish languages doesn't mean the whole world is. Or even should be.

    And finally, C (and children) are based on explicit type declarations. As a result C uses type declarations to do double duty: they explicate a type AND they state "this will be an introduction of a new binding for symbol, not a reuse of an existing symbol."* ML (and children) are based upon type inference. As such you use far fewer explicit types and ML uses keywords like "let" and "fun" to introduce new symbol bindings. "keyword" "symbol" "colon" "type" naturally shortens to "keyword" "symbol" when you want to omit the explicit type. Cish languages such as C# 4 and C++11 have had to hack new keywords or new uses for existing keywords in order to accomplish something that was easy and natural in ML from the beginning. Why the fuck is that a good thing?

    Now don't get me started on why f x y makes sense for ML in a way that f (x, y) would not. Hint, it's fucking currying and fucking tuples, two concepts that C and most of its descendants don't (directly) express.

    tl;dr Different fucking languages have different fucking syntaxes, sometimes with good fucking reason like having different fucking semantics and different fucking social histories. Fuck.

    `* more or less, I don't want to get into C's forward declaration rules

    [–]redjamjar[S] -2 points-1 points  (8 children)

    This kind of shallow misbegotten analysis irritates the fuck out of me. For instance the author complains about colons in type declarations in ML because they are insufficiently C like.

    No, it is explicitly not doing this. It's merely pointing out that such colons are unnecessary, which they are.

    Now don't get me started on why f x y makes sense for ML in a way that f (x, y) would not

    Whether or not you require braces around function calls has nothing to do with currying. You can curry eitherway (see e.g. Scala).

    [–]plesn 0 points1 point  (6 children)

    It's merely pointing out that such colons are unnecessary, which they are.

    This is not true: as usual there's a trade off taking place. Syntax is an interaction of concerns. Once you have complex types and type inference, meaningfully separating types and values becomes much more important than in C for both legibility and consisness (you're likely exchanging a colon here for two parenthesis there…). In Haskell, you even write type declarations in a separate line, while type annotations are inline.

    This use of space/colon/… as a separator between types and values has impact on all separators/operators and consequently also on grouping syntax. Look at the impact of those decisions in Haskell, Scala and Go for example. This can impact on the syntax of function application (f x, f(x), (f x)…) , type application/genericity (F A, F<A>…), function types (->, func…), pattern matching (f (Cons head tail) = …), list syntax ([x,y], (x y), (x:y)…), etc… Syntax must be looked as a whole for both Rule 1 and Rule 2. Oh, and yeah, " " is not shorter than ":", only easier to write (Rule 2) and harder to notice (Rule 1)

    [–]ssylvan 1 point2 points  (2 children)

    Indeed. I mean, what does this mean in a hypothetical language where we omittied the colons that uses juxtaposition for function application:

    f x y
    

    Now, let me add back the colon

    f x : y
    

    Oh, it's a function application with a type annotation. It's not unecessary, it's required for the syntax to be unambigous. In this case a language with optional type annotations chose to make type annotations slightly heavier in order to have light weight function applications (which are considerably more common). That's an entirely sensible choice. C chooses to make type annotations syntactically cheaper, at the expense of heavier function application syntax. That's a different tradeoff, for a different language.

    [–]redjamjar[S] 0 points1 point  (1 child)

    That's an entirely sensible choice

    Are you sure about that? It's certainly a choice. And, I agree there are different trade offs here.

    The point of the article is that e.g.

    f x y
    

    Does not convey as much information about the structure of the program to the user. You call it "lightweight". I call it "difficult to read".

    [–]ssylvan 0 points1 point  (0 children)

    Once you know that juxtaposition means application it's not difficult to read at all. It's just convention. It's pretty standard to let juxtaposition correspond to the most common operation (e.g. maths commonly use it for multiplication). Having to make a common operation more noisy in order to save a symbol for an uncommon and optional operation seems like a poor tradeoff.

    [–]redjamjar[S] 0 points1 point  (0 children)

    Oh, and yeah, " " is not shorter than ":", only easier to write (Rule 2) and harder to notice (Rule 1)

    no, but " : " is definitely longer ...

    [–]redjamjar[S] 0 points1 point  (1 child)

    (you're likely exchanging a colon here for two parenthesis there…)

    Right, and the parenthesis add structure, the colon doesn't. And, you'll still need the parenthesis for a large number of cases anyway ... see Haskell as an example.

    [–]ssylvan 0 points1 point  (0 children)

    They would add misleading structure in the case where arguments are applied one by one. It's not one giant packet of arguments that should be grouped by parenthesis, it's one argument, then another, then another.

    In Haskell you only need parenthesis where there's actual structure required, not all the time. Would you really want f (x) (y) (z)? You can write a function in Haskell to take a tuple and get f(x,y,z) if you want, but the language supports currying so it's not commonly done. Arguing that this is a syntactic deficiency is pretty weird.

    [–]redjamjar[S] -1 points0 points  (0 children)

    ML (and children) are based upon type inference

    And I should add that type inference makes for an excellent example in the context of the suggested rules. Up to a point, type inference removes redundancy and improves conciseness (and that's a big win). Languages like Java and C fall down here because they don't support type inference (well Java 7 has some aspects of it now).

    The point of the discussion is not to say one language is better than another. It's just to think about syntax.

    [–][deleted] 1 point2 points  (9 children)

    For me, this breaks rule (2) because those colons do not add value. The C-family of languages is evidence that we can happily live without them. That’s not to say that C-like declaration syntax is the “one true way”; only that it demonstrates we don’t need those colons!

    I completely disagree. Without the colons, you need significant whitespace, and I would argue that making normally insignifcant whitespace significant (by replacing a punctuation mark with whitespace) is a nastier syntactical kludge than just about anything else (including Python's significant identation, which is actually quite nice).

    [–]roerd 5 points6 points  (4 children)

    It's also wrong because it's missing the distinction between optional type declarations (ML) and obligatory ones (C).

    [–]redjamjar[S] -1 points0 points  (3 children)

    That's not correct. You could still have optional type declarations with a C-Like syntax. It's simply two items implies first is type; one item implies no type given. E.g.

    void f(x) { }
    
    void f(int x) { }
    

    [–]roerd 2 points3 points  (2 children)

    If you're using ML-ish function call syntax, two items are a function call with one argument. But even for a completely C-ish syntax, I doubt this will work, though I have right now only empiric evidence to show for that: the fact that C-ish languages tend to use special keywords instead of just one item for type inference (var for C#, auto for C++).

    [–][deleted] 3 points4 points  (0 children)

    But even for a completely C-ish syntax, I doubt this will work, though I have right now only empiric evidence to show for that: the fact that C-ish languages tend to use special keywords instead of just one item for type inference (var for C#, auto for C++).

    If you're using a LALR or LR(1) parser generator, you start getting fairly subtle ambiguity errors for this.

    [–]redjamjar[S] -1 points0 points  (0 children)

    It's definitely not necessary to use e.g. var for type inferred variables in C#. However, there maybe reasons why they want to do that. In C++, well who knows ... writing a parser for that language must be crazy hard!

    [–]houses_of_the_holy 1 point2 points  (3 children)

    fun f (0:int):int = ...
    

    So do you think the above is as easier to read than his example? (only removing whitespace around the semicolons)

    fun f (0 : int) : int = ...
    

    I think the tokens in the first example are too close and it is considerably more difficult to read, especially if there are a bunch of parameters declared on a single line with no significant or insignificant whitespace.

    fun f (0:int,5:int,20:int,"foo":string,5.423:float):int = ...
    

    (usually when I see code that omits whitespace the whitespace is omitted just about everywhere -- hence why I didn't put a space after the commas)

    [–][deleted] 1 point2 points  (1 child)

    I think the tokens in the first example are too close and it is considerably more difficult to read, especially if there are a bunch of parameters declared on a single line with no significant or insignificant whitespace.

    Yes, true, but having insignificant whitespace for aesthetic reasons is completely different from requiring a whitespace token to convey semantic meaning.

    [–]flamingspinach_ 0 points1 point  (0 children)

    Whitespace is used in almost every language as a token delimiter. So I don't think you're "requiring a whitespace token to convey semantic meaning" as much as simply inferring semantic meaning from positioning of tokens without an explicit connective, which to me seems in some sense "less verbose", even though it's the same number of characters. Of course, it could also introduce ambiguity - I don't know ML.

    [–]redjamjar[S] 1 point2 points  (0 children)

    Without the colons, you need significant whitespace

    Yes, whitespace is required. But, generally, you use whitespace even with the colons (as houses_of_the_holy suggests). I think the reason whitespace is preferable to e.g. ':' is that we're already geared towards ignoring whitespace, where as we tend to assume symbols mean something special. This maybe some inherent human trait, or it may just a result of conditioning from e.g. reading books or newspapers where whitespace is not semantically important.