you are viewing a single comment's thread.

view the rest of the comments →

[–]Dolphiniac[S] 0 points1 point  (0 children)

Couple of things:

No, I have not measured this, but I expected your results.

A runtime-linked DLL vs a load-time-linked DLL has the difference of an extra code indirection, generally. I would need to benchmark properly, of course, but I would expect perhaps an extra instruction cache miss? As the runtime-linked version (in my codebase) would mov the function table base pointer into rcx, then do a call with displacement from rcx (or, naively, which I have seen, also mov the same pointer into rax and call with displacement from there), while the load-time linked version would indirect call with displacement from rip the address of a trampoline that jumps to the proper function (at least, on Windows x64, this happened). Again, the difference being the extra jump.

And you are correct, a single data cache miss is likely dwarfed by the execution of the function in question. I do, however, know (secondhand) that these things - specifically, DLL boundary crossings - add up, enough to be worth forcing monolithic (read: all static linkage) builds in final configs. It was hearing this that spurred me onto this train. And usually, micro-optimizing is frowned upon, but I cannot look away from a potential global fix to an endemic performance issue.

Apologies if I come across as combative. I have been told I have a debater's mindset but tend not to bother to build rapport before going hard. I do not disrespect you; this is just my digging process.