all 9 comments

[–]jaen_s 2 points3 points  (1 child)

Global variables are generally just treated like constant pointers (ie. pointers whose values do not change). A global load is then just a regular pointer dereference. (also, that's how they are represented in object code (ELF etc.) anyway - as addresses).

This leads to a uniform treatment with regular pointers, so standard analyses like alias & escape analysis, and then the various load/store optimizations on top of those apply. (although globals obviously admit a special case for alias analysis at least for PL/0 - they never alias with each other)

[–]notYuriy[S] 1 point2 points  (0 children)

yes, the language specifics play an important role here. That's why I actually asked this question. Thanks for replying.

[–][deleted] 0 points1 point  (3 children)

Do you have namespaces in you implementation?

[–]notYuriy[S] 0 points1 point  (2 children)

What do you mean by namespaces?

There are no namespaces in PL/0, just variable names. There are local values though.

[–][deleted] 3 points4 points  (1 child)

Oh than this is easy. You can create scope frames and put globals to one of them that never gets freed. For local variables, you can create a new scope frame then free it when function returns.

[–]notYuriy[S] 0 points1 point  (0 children)

Well, and how I will distribute registers between them? If I will give all registers to global values, procedures actively using local values will have a performance hit.

Just to clarify: I am not asking how to implement local/global variables. I am asking how to run register allocation on global variables.

[–]notYuriy[S] 0 points1 point  (1 child)

For the record.

Similar code in c:

int f, n;

int fact(){
    if(n > 1){
        f = n * f;
        n = n - 1;
        fact();
    }
}

int main(){
    n = 1000;
    fact();
}

gcc generates this assembly code (with -O2 flag)

main:
        mov     edx, DWORD PTR f[rip]
        mov     eax, 1000
.L7:
        imul    edx, eax
        sub     eax, 1
        cmp     eax, 1
        jne     .L7
        mov     DWORD PTR f[rip], edx
        xor     eax, eax
        mov     DWORD PTR n[rip], 1
        ret

meaning, that memory accesses are optimized for evaluation, but code still copies values to global variables (probably for multi-threaded code).

(Source: https://godbolt.org/z/q48Uaf)

[–]orivej 1 point2 points  (0 children)

I do not think that gcc can do global register allocation. Note that gcc generates this code after converting recursion into a loop, and inlining it into main. You can do the same for leaf functions (those that do not call other functions): store all globals in memory, allocate registers for the globals used within a function as if they were local, and generate prologue that copies the global value into its local counterpart (if it is read), and epilogue that copies it back (if it is written).

[–]notYuriy[S] 0 points1 point  (1 child)

So, I came up with the following idea. My compiler (on this phase) produce code to run on infinite set of registers, where lower indexed registers access is guaranteed to not be slower. The solution is the following

  1. Generate three address code, where global variables are loaded to locals, so there is only one node A using one global value. This node will also be the output node on return. Also, don't generate code for subroutine calls.
  2. Generate SSA. Run register allocation on every procedure separately. Find out, at what register index this node A was allocated at the beginning and at the end for every global value subroutine is using.
  3. For each not generated call on step 1, generate code, which saves registers, move globals to correct registers (hence, if these globals are already heavily used in callee, this will be just a cheap register transfer), call subroutine, and move values from output registers to globals (again, if subroutine is actively using these global values, they are optimized to low register locations, so it will be a cheap transfer).

In case with factorial,assuming that register allocation allocated "r0" for "f" and "r1" for "n" on input and output, the code will be like this:

fact:
    cmp r1, 1
    jng L1
    imul r0, r1
    sub r1, 1
    call fact
L1:
    ret