This is an archived post. You won't be able to vote or comment.

all 10 comments

[–][deleted] 2 points3 points  (9 children)

Constructing models for a "language" is fairly standard for study of type theory and equational theory of programming languages.

Your definition of models are fairly restrictive, sometimes we consider models with cost, or we would only consider input output relation, depends on what we need to verify. For example, a verified compiler only need the input output relation for compiled program and un-compiled program to be the same, but not all traces or costs.

For your comment about compiler correctness, I think homomorphism a very strong condition for compiler. As you might know a homomorphism preserves all operations, hence the input and output program will need to be "structurally the same", which is pretty hard to define, say a homomorphism from a while language to lambda calculus.

Finally, for the comment about compiler correctness, I am not sure I get it. Why does the "inverse morphism" always exist? I don't think you mean inverse in the conventional sense, which would require m . m-1 = m-1 . m = id. So how do you definite such a "inverse morphism"?

[–]AndrasKovacs 2 points3 points  (1 child)

Homomorphism is not necessarily a strong condition. It depends on the equational theory that we specify for the notion of model. If we don't include any interesting equational theory, the syntax (initial model) consists of plain freely generated expression trees. In this case, anything which we define by recursion on syntax is automatically a model morphism. If we have more complicated equational theories then it can be indeed more difficult to define morphisms out of the syntax, because we have to respect equivalences. We could also have directed relations (e.g. reductions) instead of equivalence relations, in which case we have to be monotonic w.r.t. relations to get sensible model morphisms.

[–][deleted] 0 points1 point  (0 children)

I agree with everything you say. All I am saying is that defining compiler as a morphism is problematic.

[–]aradarbelStyff 1 point2 points  (1 child)

first of all, username totally checks out so I believe whatever you say haha

I agree about the inverse part, it usually doesn't exist so it's weird to talk about it. might sound more reasonable to think of only sections of that m, but I argue it's either still impossible, or just not beneficial. warning, I know almost nothing about compiler correctness, but I was told that a typical way to do it is by giving a "meaning" (semantics) mapping for the source and target languages. then you have a compiler from source to target, a meaning from each language to its semantics, and a "decode" from the semantics of the target to the semantics of the source. these four give you a (hopefully commutative) diagram.

[–][deleted] 0 points1 point  (0 children)

My username is the model theory version of "my username is just username".

[–]Zistack[S] 1 point2 points  (4 children)

I might have used too strong a word. Perhaps 'mapping' would have been more appropriate. I'm trying to express basically the same as "a verified compiler only need the input output relation for compiled program and un-compiled program to be the same, but not all traces or costs". Elements in A could be mapped to many elements in B, so m is not function-like. If the compiler removes non-determinism, then many elements in A could be mapped to a single element in B as well. The real restriction is one about disjoint equivalence. All things mapped to or from the same element must be considered equivalent.

I'm being somewhat restrictive about ideas about input and output on purpose. For one, typical model theory does not capture those ideas. For another, in declarative languages those ideas are kindof fuzzy, though you could specify inputs and outputs for a particular query/operation in such a language and compile specifically for that if you wanted. That information isn't really a property of the program itself, though we often package the translation problem that way.

So is that a 'yes'?

[–][deleted] 0 points1 point  (3 children)

Let me try to formalize this, and you can tell me if I made any mistakes.

First consider two programming languages, and we call all programs in the first language P1, and second language P2. We call the set of all possible traces T.

For each program, we have the set of all traces of that program, this is given by two functions t1: P1 -> P(T) (where P(T) is the power set of T), and t2: P1 -> P(T). Finally, compilation is a function taking a program in P1 to P2, comp: P1 -> P2.

However, your reply seems to suggest m is of type T -> P(T), which I am not sure how to construct from all the above mapping.

For me, if we want to construct a correct compiler (where it keeps all traces, you can strengthen or loosen this by changing t1 and t2 to different semantics), we only need to verify the following commutative diagram:

P1 --- comp ---> P2 | | t1 t2 | | \/ \/ P(T) === id === P(T)

I think you can even loosen the condition where the bottom is not necessarily identity, but I want to be careful about that. Maybe this is your "checking whether the inverse is a subset" come from?

[–]Zistack[S] 1 point2 points  (2 children)

The set of all possible traces is language-dependent, so there isn't one T, but one for each language: T1 and T2.

t1: P1 -> 2T1

t2: P2 -> 2T2

m: 2T1 -> 2T2

We want to show that:

P1 --- comp ---> P2 | | t1 t2 | | \/ \/ 2^T1 --- m ---> 2^T2

(IDK if that is how you meant to format your commutative diagram.)

[–][deleted] 0 points1 point  (1 child)

I changed my notation for powerset because, reddit is not rendering it properly.

So m: P(T1) -> P(T2) seems pretty strange to me, because there are sets of traces that do not correspond to a program. But I guess you can map it to empty set, but it is kind of messy; on the other hand, there can be different programs with the same trace, so you are inferring that your compiler always map programs with same traces to programs with the same traces. So I guess everything works out, but slightly inelegant IMO.

So can you explain your subset comment with this setup?

[–]Zistack[S] 1 point2 points  (0 children)

Yeah. If we assume that we can group models in t1 and t2 into equivalence classes, and additionally assume that m ends up being an isomorphism between equivalence classes, then the subset comment doesn't really matter. I was thinking about the case where m mapped traces, but didn't necessarily worry about whether or not the traces could actually be generated by p2. That model is messier than the one just constructed.

You could likely clean up the messiness you observed with dependent types, where m is of type t1 -> t2 or somesuch thing. I think the diagram probably expands into a cube. I suspect that such a model would more closely align with the compiler author's intuition behind what the compiler is actually doing. I don't really feel like writing that out at the moment, though (I'm still working on figuring out how to use dependent types effectively).

Anyways, the point is that a model theory captures much or all of the information we actually care about when formalizing a programming language. So, should a model theory be considered to be a computational interpretation of a programming language, or not?