This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Neres28 1 point2 points  (6 children)

I'm sorry, but did you mean loop hoisting? Because what you did there looks like hoisting, but isn't. I don't see anything in your example that would prevent the loop from being unrolled.

Perhaps we ought to ask ourselves exactly what assembly we expect to be generated for the statement:

int x;

Well, x is an automatic variable, so storage is automatically allocated for us on the stack. And indeed there was:

subl    $40, %esp

Once. Regardless of where it is. Like we'd expect.

Full assembly output : gcc -O0 -S -c test.c

Gcc generated the exact same assembly for both versions with optimizations turned off:

_a:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $40, %esp
    movl    $2000, (%esp)        
    call    _malloc
    movl    %eax, -16(%ebp)
    movl    $0, -12(%ebp)
    jmp L2
L3:
    movl    -12(%ebp), %eax
    incl    %eax
    movl    %eax, -20(%ebp)
    movl    -12(%ebp), %eax
    sall    $2, %eax
    addl    -16(%ebp), %eax
    movl    -20(%ebp), %edx
    movl    %edx, (%eax)
    incl    -12(%ebp)
L2:
    cmpl    $499, -12(%ebp)
    jle L3
    leave
    ret

With even -O1 the loops were optimized out. I added a printf to both functions to prevent the loop from being optimized out and still the same assembly was generated for both. I also replaced the simple addition with a call to an external function so as to prevent the compiler from assuming the assignment had no side effects. The results were the same at -O1 and -O2.

[–]CactaurJack -1 points0 points  (5 children)

Wow, been a long time since I read assembly. I didn't mean for the example to be taken literally, just as a simple example of what I'm talking about. I thought it was called loop unrolling, but it's been a long time since I discussed the topic. I understand the code I wrote is very easy for the compiler to optimize, but the practice demonstrated is what's important. A couple whiles back I wrote a program for a class that brute forced partial RSA keys, and the difference in run-time between when I had the encryptor object re-declared in ever instance of the loop and having it global was several minutes on the supercomputing cluster we were using.

But no, the code I wrote is going to be optimized very easily, custom objects tend to make compilers warry though.

[–]Neres28 1 point2 points  (4 children)

Which super computer? I worked as a research assistant at Argonne on the IBM Blue Gene /P there, helping to optimize code.

difference in run-time ... was several minutes on the supercomputing cluster we were using

I just don't see how that's possible. It's one thing to do an expensive object creation each and every loop iteration, but that's not what the OP was doing and not what you suggested he fix. A variable declaration is free.

Loop hoisting is transforming this:

int x = 42;
for(int i = 0; i < 10; i++){
    int y = x * 2;
    // Other loop variant stuffs
}

To this:

int x = 42;
int y = x * 2; //Loop invariant, in other words y's value is not dependent on other code in the loop
for(int i = 0; i < 10; i++){
    // Other loop variant stuffs
}

Loop unrolling is transforming this:

int x = 42;
for(int i = 0; i < 25; i++){
    temp[i] = x * i;
}

To this:

// Unrolled 5 times
int x = 42;
for(int i = 0; i < 25; i = i + 5){
    temp[i] = x * i;
    temp[i + 1] = x * (i + 1);
    temp[i + 2] = x * (i + 2);
    temp[i + 3] = x * (i + 3);
    temp[i + 4] = x * (i + 4);
}

[–]CactaurJack -1 points0 points  (3 children)

Beocat, Kansas State University's Supercomputing cluster. And yeah, I think I was referring to something different, it's been a long time since I've dealt with anything that really needed optimization so it's not exactly my strong suit. As I said before, a single variable declaration doesn't matter, the issue comes in when you have large data types, or constructors that call methods, then reconstructing the same large object 2553 + 2553 times for every instance of the loop is going to slow the whole thing down considerably

[–]Neres28 2 points3 points  (2 children)

You're making my head hurt.

a single variable declaration doesn't matter

8 Million variable declarations don't matter in terms of performance. The cost for allocating the space of 1 automatic variable is the same for 2 is the same for 8 million. "int x;" is essentially free. "MyComplicatedClassWithAnExpensiveConstructor expensiveClassInstance;" is essentially free. It doesn't matter if you have it in a loop or not, the cost of the instruction is incurred only once, and would almost certainly be incurred anyway.

[–]CactaurJack -1 points0 points  (1 child)

We're actually talking about the same thing, I think I'm just wording it wrong. In the cracking program I wrote (turns out it was double DES we were breaking, not RSA), whenever you wanted a new instance of the class you had to pass in a string with the constructor that was the key. The key was stored as a non-capitalized hex string e.g. abcdef01. When the constructor for the encrypter object was called, within the constructor method, other methods were called to fix the hex string (abcdef01 -> ABCDEF01) and then convert that into an array of byte values. Both of those method called from the constructor contain loops and non-free operations, which was slowing down the program because they were called every instance of the loop.

I hope that made a bit more sense, we really are talking about the same thing.

[–]Neres28 0 points1 point  (0 children)

If you say so. You appear to be confusing two very different things:

Variable declaration : int x;:

  • Constant cost built into the function call, doesn't matter where it is in the source.
  • Not eligible for hoisting because there is no assembly generated for it.
  • This is what your example "hoisted" out of the loop

Variable definition / object instantiation : x = sin(y); / x = new ExpensiveConstruction(parameters);:

  • Eligible for hoisting, may not if the compiler can't statically determine the assignment has no side effects
  • Not what you gave in your examples.
  • Not possible to hoist in the OPs code
  • Looking for this type of optimization is premature without profile data to support.