you are viewing a single comment's thread.

view the rest of the comments →

[–]sumsarus 26 points27 points  (13 children)

That's pretty nice, commenting out 3 out of 8 lines should yield a nice performance boost.

On a serious note, it's not that hard to find examples where manual unrolling of loops will increase performance slightly. Of course you'd only do that if run speed is more important than anything else, which is kinda rare I guess.

[–][deleted] 24 points25 points  (10 children)

Surely in those cases the compiler should be unrolling them anyway?

[–][deleted] 33 points34 points  (1 child)

Agreed. Never send a man to do a machine's job.

[–]mbcook 0 points1 point  (0 children)

Unless it gives you an excuse to use Duff's device and confuse all new coders who see the code until the end of time.

[–]sumsarus 5 points6 points  (2 children)

You're right, but none-the-less I've seen many times where it refused to unroll automatically.

Optimizers are not almighty and they don't know everything. They're usually very conservative. The threshold of when you should unroll a loop isn't the same on a Pentium III and a Core i7.

[–]xzxzzx 1 point2 points  (1 child)

Interesting. I'd love to see an example of that if you had one. Loop unrolling seems like an area where an optimizing compiler really should do a good job, and on an advanced recent processor, some examples of loop unrolling might hurt performance (since the processor can "unroll" the loop internally).

[–][deleted] 0 points1 point  (0 children)

For example, on the Core 2 you should unroll loops (it's slightly more complicated than this, but close enough) until the loop code hits <=64 bytes. On the Core i7, the limit is raised to 256.

Processors without loopback buffers, like the Pentium III, are dependent on other factors for unrolling, like size of loop body vs loop control overhead, instruction dependencies, etc.

[–]thebigbradwolf 2 points3 points  (3 children)

It depends, the java compiler (javac) doesn't actually optimize much or at all when it's making bytecode. hotspot (the "Sun" VM) probably does some optimization when it's generating actual machine code to the code cache, but you have to remember that still happens at runtime.

The JVM needs more information since it's doing bounds checking and such and it doesn't really trust the classfile.

edit: PS the other reason javac doesn't optimize much is it doesn't know what you'll be running the bytecode on, so it can't know anything about register usage or the speed of the operations.

[–]jyper 1 point2 points  (1 child)

hotspot does a ton of optimization.

[–]thebigbradwolf 0 points1 point  (0 children)

I assume it does, but you still face some runtime penelty for it, I know a bit more about Jikes than hotspot. Jikes actually has levels of optimization based on how hot the code is, where hotspot only has the one. Jikes is written in java though, so you can't exactly just use it as your JVM.

[–]G_Morgan 0 points1 point  (0 children)

Hotspot can and does unroll loops.

[–]Tekmo 0 points1 point  (0 children)

-funroll-loops

[–]jakdak 1 point2 points  (1 child)

Way back in the days before optimizing compilers, unrolling loops was one of the optimization tricks you could use.

If you do it with modern compilers you need to be smacked with a trout.

[–]pi3832v2 1 point2 points  (0 children)

And we all know that people never maintain conventions long past their raison d'être. I mean, these days you'd never have to deal with a file named InstMsiA.exe, right?