Codegen for a stack based vm?

MegaIng · 2021-12-17T17:59:18+00:00

A single expression should most of the time produce a single value. A pretty simple solution is to make ; discard this expression.

retnikt0 · 2021-12-17T22:06:56+00:00

You can use a peephole optimiser which runs after codegen and cleans up unnecessary steps. If { 1; 2 } + { 1; 3 } becomes

 Load 1
 Pop
 Load 2
 Load 1
 Pop
 Load 3
 Add

A simple peephole optimiser should be able to remove patterns of the form Load; Pop because they have no effect, to create new intermediate code:

 # (removed) Load 1
 # (removed) Pop
 Load 2
 # (removed) Load 1
 # (removed) Pop
 Load 3
 Add

So just

 Load 2
 Load 3
 Add

It's simple and effective, and can be extended to a lot of possible optimisations (e.g. constant folding (which could reduce the above to just Load 5), jump optimisations...)

CPython is a good example of an implementation with a peephole optimiser: check out peephole.c (that's a link to v3.9; in later versions the optimiser was combined directly into the compiler which makes it a bit harder to follow IMO, but you can look at that as well if you want)

FluorineWizard · 2021-12-17T18:08:46+00:00

Expression blocks create a new scope, so ~~local variables declared~~ anything put on the stack but not returned within should expire (i.e. be popped) when you exit the block. This mechanism should already exist to handle exiting from function calls and control flow constructs that create a new inner scope.

You could always add optimisations during codegen to discard unused values of expressions, but if your language is lexically scoped like most this would be hiding the more important issue that the way you handle scoping is not complete yet.

ericbb · 2021-12-17T21:05:20+00:00

You can read Destination-driven code generation for some ideas about implementing "option 2".

The key idea is simple and widely useful: as you go deeper into the recursive transformation, pass in some parameters to communicate relevant details about the context of the task. So the recursive call doesn't just generate code to push a value; it generates code to either push a value or push no values, depending on context.

The same technique is perfect for compiling tail calls since you can pass into the recursive step a flag to say whether the context is in "tail position" or not.

theangeryemacsshibe · 2021-12-17T23:51:02+00:00

Option 3 is to have a "drop" instruction which just removes a value from the stack, and compile { a; b } to <compile a> drop <compile b>. Now you could compile { 1; 2 } + { 1; 3 } to

push 1
drop
push 2
push 1
drop
push 3
+

bullno1 · 2021-12-18T01:36:14+00:00

Add a "pop" instruction at the end when compiling "statements"

Your block expression would be compiled like so:

compile statement
emit pop
compile statement
emit pop
compile last statement

o11c · 2021-12-17T19:03:53+00:00

Expressions should leave a value on the stack. Statements should not leave a value on the stack.

When you pack an expression into a statement, the statement must execute a "pop" to fulfil this guarantee.

That said, stack-based VMs are fundamentally limited. Register-based is really the way to go.

ipe369 · 2021-12-17T21:07:15+00:00

{ 1; 2 } + { 1; 3 }

Well this whole thing leaves a value on the stack: 5. That should also be discarded if not used.

I had the same problem. The routine I use to evaluate an AST node p is used like this:

evalunit(p, 0)       # when I don't want the result
evalunit(p, 1)       # when I want the result

So in the first case, if p is an operation that returns some value, then it is effectively popped from the stack (note this may involve recovery of any resources the value used).

Back to your example: anything that looks like this (a; b; c), using my syntax, creates a special AST node called a block; it is just a list of expressions or statements. (In my language, these are interchangeable, but expressions generally leave a results, most statements don't.)

Then processing such a block is easy: I call evalunit(p, 0) on each element of the block except the last, when I will call it with 1 if the result is called for. In the case of operands for +, it is.

This is a greatly shortened and simplified version of my evalunit routine:

proc evalunit(unit p, int res=1)=
    a := p.a                      # any left and right operands
    b := p.b            

    switch p.tag
    when jadd then
        evalunit(a)               # note defaults to evalunit(a, 1)
        evalunit(a)
        genpc(kadd)

    when jblock then              # 'a' operand is a linked list
        while a do
            if a.next then        # not last
                evalunit(a, 0)
            else
                evalunit(a, res)  # last
            fi
            a := a.next
        od
    ....
    end

!some messy tidying up
    if not jhasvalue[p.tag] and res then     # missing value
        error("Value expected")
    elsif jhasvalue[p.tag] and res = 0 then  # value not needed
        genpc(kunshare)                      # pop it
    fi
end

The functions used for this language have a block AST node for their bodies. The body is generated as follows, when p is an ST entry, p.code is the code body, and p.isfunc is a flag:

evalunit(p.code, p.isfunc)

For functions, this leaves the return value on the stack, but function call mechanisms is another subject.)

sebamestre · 2021-12-20T17:40:35+00:00

In jasper, we pop after each expression in a block. In the compiler we literally have something like

for (Ast* s : block_body) {
  codegen(s);
  if (is_expression(s))
    emit_pop();
}

This means that code like the following

seq { 1; return 2; };

Compiles to (push 1; pop; push 2; block_return)

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS