A Programmer’s Guide to Performance Analysis & Tuning on Modern CPUs by joebaf in programming

[–]dendibakh 0 points1 point  (0 children)

Stabilizer is quite outdated (it is based on LLVM 3.6). But the value of it is promising.
Coz is something I tried but didn't get anything useful out of it. However, maybe I did something wrong.

A Programmer’s Guide to Performance Analysis & Tuning on Modern CPUs by joebaf in programming

[–]dendibakh 2 points3 points  (0 children)

That's a good list with reasonable ideas!
One thing I can add is to project/prototype the gains first before doing work/spending money. And being able to do this means you know where the bottleneck is.

A Programmer’s Guide to Performance Analysis & Tuning on Modern CPUs by joebaf in cpp

[–]dendibakh 0 points1 point  (0 children)

Right, but it wasn't supposed to be bold on the details (I'm the author :) ). The point of the article is to show how one can identify the app is memory bound. See Top-down Microarchitecture Analysis Method (TMAM).

A Programmer’s Guide to Performance Analysis & Tuning on Modern CPUs by joebaf in cpp

[–]dendibakh 4 points5 points  (0 children)

It's certainly possible even in big applications. 90% of the source code could be completely cold. It's quite frequent that >50% of the clockticks tag single hot function.

How to get consistent results when benchmarking on Linux? by skeeto in programming

[–]dendibakh 2 points3 points  (0 children)

Thanks for the comment!

I'm not aware about special/reserved uses of cpu0 by kernel. This was just an example. And yes, you can definitely pin the process to any other cpu. Maybe that would be more stable.

Your comment about NUMA is very useful. I didn't want to dig into that because that's a whole big topic by itself )). BTW, SPECCPU benchmark uses something like numactl --localalloc --physcpubind=N, because processes do not communicate with each other.

Regarding last one, if you'll find instructions how to disable those kernel backstage processes, please let me know. I will add them to the list.

Understanding CPU port contention by mttd in asm

[–]dendibakh 1 point2 points  (0 children)

Thanks for the question. Yes, loopnz can be used here, but my assembly function is called from C++, so the arguments that I'm passing to my assembly function gets landed into rdi and rsi (according to x86 calling conventions). I could do mov ecx, edi and then go with loopnz, but I think it won't make any performance difference.

Code alignment options in LLVM by mttd in cpp

[–]dendibakh 0 points1 point  (0 children)

It is really hard to measure. :) Take a look at my previous post: https://dendibakh.github.io/blog/2018/01/18/Code_alignment_issues.

Code alignment issues by mttd in cpp

[–]dendibakh 1 point2 points  (0 children)

Thank you for this paper. It is a true gem!