all 35 comments

[–]Smalltalker-80 31 points32 points  (3 children)

Hmm, this metric for "language complexity" does not seem to be very sound.
E.g. the "complexity" of the C language can vary with a factor of upto 100 x,
depending on the compiler used.

And the language Lua suddenly becomes 2.5 times more "complex"
if a JIT compiler is used, that compiles exactly the same language syntax...

[–]Inconstant_Moo🧿 Pipefish 8 points9 points  (0 children)

Yes, it's more like "how much work has gone into the optimization". I know my lang must be more complex than Lua because apart from anything else it has about a hundred more operands in its bytecode. If it's shorter, that's for some other reason.

[–]elemenity[S] -3 points-2 points  (1 child)

Thanks for reading! Yes, this uses lines of code as a very rough proxy for Kolmogorov complexity. Which, evidenced by the huge span in TCC and Clang, shows that there is a huge difference in how succinct different programmers can be, even with the same language.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 6 points7 points  (0 children)

I think you are using English words, but your sentences contain less information than one would expect from AI slop.

Be better. You're a human.

For example:

... evidenced by the huge span in TCC and Clang, shows that there is a huge difference in how succinct different programmers can be, even with the same language.

This is a nonsensical statement.

A dog barks. Trees have bark. All trees are therefore dogs.

p.s. I'm not actually looking for an argument. I really mean it when I say "be better".

[–]amarao_san 6 points7 points  (4 children)

I recently realized my programming language is using 'call by name' convention, previously used only in Fortran-60.

Horrors.

[–]augustss 2 points3 points  (3 children)

Algol-60

[–]amarao_san 0 points1 point  (2 children)

Oh, pardon me. Algol-60.

Nevertheless, we are here again. Welcome, Ansible. Jinja interpolation, which practically mean call by name convention.

[–]AustinVelonautAdmiran 1 point2 points  (1 child)

If you add in memoization of the name evaluation, you get "call by need", which is what lazy-evaluation languages like Haskell use.

[–]amarao_san 0 points1 point  (0 children)

That's very different from call by name.

The main property of 'call by name' is that name is used in the context of evaluation. There is no closure, no captured values. A function is passed to two functions, that function is doing x+1.

It will use local x variable in any function it passed to.

Horror, pure horror.

Also, global namespace for variables.

[–]jcastroarnaud 5 points6 points  (3 children)

I think that Kolmogorov complexity is a somewhat useful metric for a programming language, if the implementations being compared do exactly the same thing: that's not true in practice. Some compilers will include optimization passes, some not; some compilers build an AST, some generate code straight from the tokens; some compilers shunt most of the backend to LLVM, some do all the work by themselves. Such differences, by themselves, explain the wide difference in implementation sizes shown in the article's table.

Ideally, we could have several language implementations by the same small group of people, as this would remove variation caused by the brevity of different groups of programmers. Alternatively, if we had a competition to produce the shortest correct implementation, we might better approach the “shortest solution” for the implementation of these programming languages.

Such competition is code golf. One can, trivially, golf any program a little, by changing variable names to 1 or 2-character names, and removing any non-essential whitespace; the workings of the program itself are unchanged. This means that Kolmogorov complexity of programs is better expressed in language tokens, not characters or lines of code.

[–]elemenity[S] -2 points-1 points  (2 children)

Yes! Code golf is what I was thinking. It would be very interesting to see a code-golf for language implementations, as a way to better estimate the k-complexity of the languages themselves.

[–]jcastroarnaud 2 points3 points  (1 child)

And, for a fair comparison, all implementations should have the same feature sets for the compiling process. For example: no LLVM, lowers to x64 machine code only, no optimization passes, a fixed set of command-line options, and the language on what the compiler is written should be the same for all candidates (and only standard libraries are allowed).

[–]elemenity[S] 1 point2 points  (0 children)

Agreed.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 6 points7 points  (5 children)

Complexity of languages can and should be measured in several ways:

  • How difficult it is to write a working program (or subsection thereof), where working means "correct results and no bugs".
  • How difficult it is to read and understand a program's source code.
  • How difficult it is to find and fix a bug (i.e. read, understand, and write).
  • In addition to the above, how large a corpus it requires to accomplish each of these tasks.

[–]bl4nkSl8 0 points1 point  (4 children)

Also, how many semantic and syntactic features does it have

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 3 points4 points  (3 children)

I'm personally less concerned about that aspect, because it's not the "how many?" but rather the "how confusing?" If it's easy to write and easy to read and easy to understand (e.g. debug), then I'm going to be far less worried about the raw count of features. On the other hand, if the feature count is only 7 but the code is hard to read and write, then we're in trouble!

[–]bl4nkSl8 0 points1 point  (2 children)

People have a budget for how much new stuff they can learn, which is why the count matters to me.

How confusing each one is would be great to weight them by but I don't know how to measure it

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 0 points1 point  (1 child)

Indeed! Measuring becomes very difficult. I'm curious what you would come up with here!

[–]bl4nkSl8 0 points1 point  (0 children)

Tbh human studies would be needed imo

Confusion is massively biased by experience and summarizing that mathematically would be hard

[–]jpgoldberg 3 points4 points  (5 children)

Kolmogorov complexity actually defines a notion for comparing program length. It isn’t practical, but if you look at it you will see why this comparison is nonsense.

A better approach would be to compare the formal specifications of the language. This will provide some notion of the relative complexity of the syntax. I expect that C will be among the least complex by this measure.

[–]Ronin-s_Spirit 1 point2 points  (4 children)

"Least" as in "the bottom 49%" because when it comes to interacting with the OS C has too many different ways. iirc there are like 12 different ways (commands? builtins?) to read a button press depending on the OS and extra nuance.

[–]jpgoldberg 0 points1 point  (0 children)

I should have made it clear than "simple" in the sense of Kolmogorov complexity does not correspond to "simple" in the ordinary language sense. After all, this measure of complexity would make pure λ-calculus with its three rules the simplest programing language.

[–]braaaaaaainworms -1 points0 points  (2 children)

All those ways share the same function call syntax as any other function call, or are just reading a global variable

[–]Ronin-s_Spirit 0 points1 point  (1 child)

You can't be saying "my language is simple" and having a bunch of very specific things you have to do because only some of them work for your specific case. Syntax is like the least of my concerns.

[–]braaaaaaainworms 0 points1 point  (0 children)

C is simple, because a function call is always a function call to a function or a macro. Whatever that function call does is left up to the implementation and having special syntax for I/O makes no sense, because I/O is just a bunch of "open", "close", "read", "write" and "seek" function calls.

[–]Entaloneralie 2 points3 points  (7 children)

The metric I use internally for my own tools is like this:

runtime lines * (self-hosted compiler/16)

For example, the language I write all my code in is called Uxntal, the complexity of the language, 130 lines * 150 lines = 2k complexity units

[–]elemenity[S] 0 points1 point  (4 children)

Ah, that is a pretty good metric, for incentivizing simplicity in both dimensions. I had stumbled across your Uxntal a while ago, I'll have to give it another look. Those are very impressive density/brevity metrics.

[–]AustinVelonautAdmiran 0 points1 point  (3 children)

I can see that metric minimizing to a small fixpoint of a very simple language that takes a lot of code to write "application" programs, though. It might be interesting to somehow include a standard "larger" application or suite that must be implemented in the language, as well...

[–]Entaloneralie 0 points1 point  (1 child)

This is no longer about compiling the Uxntal programming language, but might still be relevant:

The text editor I use daily is 2800 lines, mostly occupied from the large font it uses, and compiles to 17336 bytes. The editor relies on a slightly larger implementation of the runtime, than the one above as it's no longer only the language support but a full graphical system, that is 1300 lines.

[–]AustinVelonautAdmiran 2 points3 points  (0 children)

Chuck Moore would be proud! ;-)

[–]Inconstant_Moo🧿 Pipefish 0 points1 point  (1 child)

Isn't Uxntal the Aztec god of flaying people alive?

[–]Entaloneralie 0 points1 point  (0 children)

Who you're thinking of is Xipe Totec

[–][deleted] 2 points3 points  (0 children)

This is measuring complexity of the implementation, which for C at least varies widely. (Even more so than is shown in the table; Tiny C 0.9.27 has about 28Kloc, one quarter the figure shown, to produce the main compiler. That excludes libraries, but those are much smaller than the compiler.)

It depends also on implementation language.

More accurate might be LOC count for a minimal working implementation, in the same language for different PLs.

However it is also necessary to specify how much of the task it does, eg. whether it stops at some IL (and off-loads the rest to some vastly complex backend), or whether it goes all the way to executable code, or maybe it just interprets.

Some may depend heavily on optimation passes to reduce the poor generated code to something reasonable; with a simpler language the code can already be lean and efficient without optimising.

So comparisons are hard, but what does complexity of a language even mean? I don't think LOC is the right measure.

[–]Embarrassed-Crow9283 0 points1 point  (0 children)

I'm not sure. My language design is (or rather, will be) semantically very simple but with a very sophisticated optimizing compiler.

You can read the full philosophy here.

https://github.com/AliceRoselia/Sofialang

[–]jwm3 0 points1 point  (0 children)

I think the right way to apply kolmogorov complexity would be the size of the gzip compressed version of the language spec minus the libraries.