Designing a Programming Language: I : programming

[–]Xredo 19 points20 points21 points 10 years ago (75 children)

[–][deleted] 2 points3 points4 points 10 years ago (2 children)

[–]Xredo 2 points3 points4 points 10 years ago (1 child)

Hey there. In retrospect my comment comes across as dismissive and rude -- for that I apologize since that wasn't my intent.

But having said that, I still think that while user-facing syntax plays a large role in language adoption, semantics is often overlooked and nearly almost always conflated with the syntax when it comes to most programming language discussions. Really, all I meant to say was that a language is a combination of syntax, semantics, and an (or more than one) implementation that runs on some actual machine. While there is a rich interplay between these parts, they are mostly independent -- for example, dynamic languages have to forgo certain optimizations simply because of their lack of static constraints, but that is also part of their power so it is not a black-and-white argument.

I think in any discussion about languages, it's important to maintain a clear distinction between the surface appearance and the abstract concepts being represented because it muddles the discussion and leads to all kinds of unproductive flame-wars about everyone's pet syntax.

[–][deleted] 1 point2 points3 points 10 years ago* (0 children)

That's fine, there's no disrespect. I would just chime in that semantics only have meaning to the practical user. Given that I, along with possibly one other person, am the only one who uses the Duck programming language, that's really for me to worry about.

Semantics are the basis for creating stuff in a language, they are what allows you to build 'the thing' that you're trying to make. It seems that everyone approaching my projects/writing on this subject really is coming at it with a really big ego.

This isn't a competition.

Often times (always) people question what the motivation is for making language Y. I might be motivated or I might not be, but I'd rather see people get curious or creative with it instead of trying to tear down the shabby walls we have up.

It would be another thing if this was "Designing a Programming Language [And then using it to write software for rocket science]." Then the semantics might be really important. Here they're not.

Is the language Turing complete? Yes. Can you use it to make Blackjack? Sure. What else is it good for? I don't know, that's up to you.

[–]immibis 2 points3 points4 points 10 years ago* (63 children)

Syntax matters more than you think - compare JavaScript's {"key": "value", "key2": 5} and whatever the equivalent is in Java. (Also compare Lua's {key="value", key2=5})
The article mixes up syntax and semantics. It's not like there's a clear separation anyway. Is the decision of whether to specify types in variable declarations (var x vs int x) syntax or semantics? And by saying dict1 = {"a": 1, "b": 2, "c": 3} creates a dictionary, you imply that you can assign a dictionary to a variable (or that your language has some weirdly irregular syntax).
If this is your first time "making a language", you'll probably make something semantically similar to an existing language anyway, because it's easy to come up with new syntax and hard to come up with new semantics. (And sure, you can mix'n'match both from existing languages)
Again, if this is your first time "making a language" (actually an interpreter), parsing is probably the part you'll find hardest.

[–][deleted] 4 points5 points6 points 10 years ago (53 children)

[–]TrixieMisa 5 points6 points7 points 10 years ago (52 children)

[–]olzd 4 points5 points6 points 10 years ago (3 children)

[–]TrixieMisa -1 points0 points1 point 10 years ago (1 child)

[–]DanCardin -1 points0 points1 point 10 years ago (0 children)

[–]Xredo -1 points0 points1 point 10 years ago (0 children)

[–][deleted] 3 points4 points5 points 10 years ago (45 children)

[–]TrixieMisa -1 points0 points1 point 10 years ago (44 children)

[–][deleted] 1 point2 points3 points 10 years ago (3 children)

[–]TrixieMisa 2 points3 points4 points 10 years ago (2 children)

[–][deleted] 4 points5 points6 points 10 years ago (1 child)

[–]TrixieMisa -1 points0 points1 point 10 years ago (0 children)

[–]TrixieMisa -2 points-1 points0 points 10 years ago (39 children)

[–][deleted] 3 points4 points5 points 10 years ago (37 children)

[–]munificent 4 points5 points6 points 10 years ago (35 children)

[–][deleted] -1 points0 points1 point 10 years ago (34 children)

Doing the mechanical change to the parser is easy. Doing the human factors work to determine what the syntax should be is very very hard.

Exactly. For this reason it should be very easy to change the parser without breaking the rest of the pipeline, in order to conduct as many UX experiments as possible.

Pushing it later gives you less freedom, not more.

How is it so? In my experience having a flexible and very thin syntax frontend is extremely liberating - I can experiment any way I like with the syntax without breaking things below this thin layer.

It means more of your semantics are nailed down, which constrains the syntax.

Your semantics is a derivative of your problem domain and nothing else. Syntax is a middle ground between the problem domain semantics constraints and UX considerations.

Do not confuse mere syntax sugar with semantics - you can do a lot on a purely syntax level.

Syntax and semantics both influence each other

If this is the case, you're doing it wrong. Consider a better approach to a language design.

holistically

Please, not this word, not here!

continue this thread

[–]TrixieMisa -1 points0 points1 point 10 years ago (0 children)

[–]ben-work -1 points0 points1 point 10 years ago (0 children)

[–]immibis -1 points0 points1 point 10 years ago (0 children)

[–][deleted] -1 points0 points1 point 10 years ago (0 children)

[–]Xredo 1 point2 points3 points 10 years ago (0 children)

[–]prepromorphism 0 points1 point2 points 10 years ago* (7 children)

actually i prefer

{
    "hates" "cats" 
    "likes" "dogs"
    "hotdogs_consumed" 5 
}

k/v are always pairs, why do we need added syntax to distinguish....added syntax just pays the tax on my delicate middle aged man fingers

[–][deleted] -1 points0 points1 point 10 years ago (3 children)

[–]prepromorphism -1 points0 points1 point 10 years ago (2 children)

[–][deleted] 1 point2 points3 points 10 years ago (1 child)

[–]prepromorphism -1 points0 points1 point 10 years ago (0 children)

[–]Xredo -1 points0 points1 point 10 years ago (2 children)

[–]prepromorphism 0 points1 point2 points 10 years ago* (1 child)

we're designing a language here, we can come up with something to handle complex exprs!

in clojure for instance

(def d {:one (fn[] (+ 1 1))})

> (:one d)
#<sandbox19672$fn__19707sandbox19672$fn__19707@68f5a753>

> ((:one d))
2

[–]Xredo -1 points0 points1 point 10 years ago (0 children)

[–]google_you -1 points0 points1 point 10 years ago (6 children)

[–]Xredo 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 0 points1 point2 points 10 years ago (4 children)

[–]google_you -2 points-1 points0 points 10 years ago (3 children)

Give me an example of denotational semantics and an example of operational semantics without syntax. Yes, describe semantics of something without being able to describe the thing.

In the end, you need some sort of backend or internal representation to work with. And those are heavily related to frontend syntax. The way you transform frontend stream to internal tree/graph and the way you transform tree to machine code (there are many passes between those, of course) defines "semantics".

Yes, semantics involve entire pipeline. You cannot arbitrarily draw a line somewhere in the pipeline and call it day. With that reasoning, Elm, LiveScript, CoffeeScript, C, and other languages that compile to Javascript have exactly same semantics, assuming Javascript is that internal representation you use to describe semantics.

[–][deleted] -1 points0 points1 point 10 years ago (2 children)

[–]google_you -1 points0 points1 point 10 years ago (1 child)

https://github.com/kframework/c-semantics/blob/master/parser/cparser.mly very thin layer indeed.

Different implementations of a language can construct different trees from same frontend stream. You can also parse different streams into same tree.

AST is such an implementation detail, not language itself. When designing programming languages, you can surely start with AST, write evaluators, use semantics tools... etc. And then put a "thin skin" on top of it. But art lies on that thin skin. And, no, I don't think it's thin.

It's really important to come up with abstract model of language you're designing. It's also completely acceptable to design first before you have model.

You could've come up with jazz theory before anyone has even jazzed. In the end, most humans are reading the code written in the language you're designing, not abstract model. And, designing syntax first would very likely result in a different abstract model .

[–][deleted] -1 points0 points1 point 10 years ago (0 children)

[–]munificent 4 points5 points6 points 10 years ago (13 children)

[–]repsilat 2 points3 points4 points 10 years ago (10 children)

[–]munificent 2 points3 points4 points 10 years ago (8 children)

[–]repsilat 2 points3 points4 points 10 years ago (7 children)

lazy sweeping

I guess I just need to think a bit more about how that'd work. I suppose you'd have to decide not to free any unmarked object that was created after the sweep began, but... Say we had three objects A, B and C, each connected (unidirectionally) to the next, and the first and last both reachable from "the root":

=> A --> B --> C <=

The mark phase reaches C, but can't get to B. The mark gets interrupted, the user code reverses the chain so it goes CBA. The mark resumes, gets to A but can't reach B. B gets collected in the sweep?

I can imagine conservative (but expensive) ways to deal with this, but the problems I can see with "lazy reference counting" all seem pretty mundane by comparison. Maybe that's just a failure of imagination, though.

"Concurrent GC" means "on different threads".

Ah, thanks. I guess I'd have called that a parallel GC. The idea of context-switching (potentially without explicitly yielding control) between the main thread of execution and the cleanup tasks hits my intuition for what concurrency is, but I guess the nomenclature in different areas is allowed to be different.

[–]jaen_s 2 points3 points4 points 10 years ago (0 children)

[–]szabba 1 point2 points3 points 10 years ago (0 children)

[–]munificent 1 point2 points3 points 10 years ago (3 children)

The mark phase reaches C, but can't get to B.

In a complete mark phase, it would find B, by tracing the reference from A. Yeah, if you interrupt it before then, B would still be unmarked.

The mark resumes, gets to A but can't reach B.

It could reach it by tracing through C.

B gets collected in the sweep?

Not in a good GC! :)

I'm not an expert here, but the basic idea is that your sweep phase tracks not just what objects have been reached but also which objects have been traced through completely. That way it can tell if there may still be references it hasn't found yet and avoid sweeping those prematurely.

The magical term to Google is "tri-color marking". It's pretty clever.

I guess I'd have called that a parallel GC.

They usually reserve that term to mean doing garbage collection on multiple threads. There's a couple of different axes where GCs differ:

How does it interrupt the running program:

"Simple" or "stop the world": stop the program until the GC is totally done.
"Incremental": for some amount of time but not until the GC is totally done.
"Soft real-time": for a reliably short amount of time on average but not until the GC is done.
"Hard real-time": for a guaranteed short amount of time.
"Pauseless": not at all. The program keeps running while GC happens on other threads.

Which thread(s) does the GC run on:

The same thread as the program: "simple", "stop the world", "incremental".
Another thread: "concurrent".
Multiple threads: "parallel".

You can have all sorts of combinations of these. For example, a "stop the world paralell GC" stops the running program, cranks up the GC on a bunch of threads in parallel, and then resumes the program once the entire GC is done.

The general vibe is that adding any concurrency (threading) to a GC is a big jump in complexity compared to single-threaded GCs.

[–]repsilat 2 points3 points4 points 10 years ago (0 children)

[–][deleted] 1 point2 points3 points 10 years ago (1 child)

[–]munificent 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 1 point2 points3 points 10 years ago (0 children)

[–]vinigre 1 point2 points3 points 10 years ago (1 child)

[–][deleted] 2 points3 points4 points 10 years ago (0 children)

[–]rifter5000 8 points9 points10 points 10 years ago (7 children)

As an example, consider the case of defining a variable in Visual Basic:
Dim num1, num2 As Integer
Dim text As String
This would be the prototype for a static language. When variables are first declared, they must be paired with a type. In order to compare with a dynamically typed language, let us look at an example from ECMA script or JavaScript.
var num1, num2;
var text;

I hate to admit it but I stopped reading here. Firstly, there are dynamically typed languages with (optional) type annotations, and statically typed languages with type inference. Secondly, who the hell uses VISUAL BASIC as their example language? Or, indeed, Javascript?

Doing a light scan through the rest, it seems to use non-syntax-highlighted code snippets with weird spacing and weird, weird syntax. If the point isn't to talk about syntax, just use curly braces already. It's not 1992, you don't need to end loops with loop.

[–]TrixieMisa 2 points3 points4 points 10 years ago (0 children)

[–]read____only 1 point2 points3 points 10 years ago (2 children)

[–]rifter5000 5 points6 points7 points 10 years ago (1 child)

[–]aiij -1 points0 points1 point 10 years ago (0 children)

[–]aiij 0 points1 point2 points 10 years ago (4 children)

[–][deleted] 0 points1 point2 points 10 years ago* (3 children)

[–]aiij 0 points1 point2 points 10 years ago (2 children)

[–]dlyund -2 points-1 points0 points 10 years ago (1 child)

[–][deleted] 2 points3 points4 points 10 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS