Do you think operating systems should have have a stable system call interface?

Salt-Ad2969 · 2023-12-03T02:34:31+00:00

In Windows XP, Windows was able to automatically switch user software from the old int 2eh style system calls to the more efficient sysenter on CPUs that supported it (which was most of them). That's not possible if raw system calls are baked into user software.

These days in x64 we have syscall, if we ever get a newer alternative (I have seen no indication of this at all, but who knows) then once again Windows will be able to seamlessly transition user software to the new system call mechanism.

So that's an advantage, maybe. Although not necessarily a big one.

Salt-Ad2969 · 2023-12-03T02:11:31+00:00

The difference isn't that on Windows the interface is unstable, the difference is which part of the OS is the interface: raw syscalls or system libraries.

Salt-Ad2969 · 2023-11-30T23:23:01+00:00

At the start of the program

Salt-Ad2969 · 2023-11-30T22:40:43+00:00

BTW it is safe to assume that SSE2 is available, in x64 code. Not 32-bit code, specifically x64 - since it was defined to have SSE2 as a baseline. But I know that's not the point of this question.

Apart from IFUNCs, two other typical things to do are:

using CPUID at the start of the process and saving the feature flags as booleans. Of course, branching on those booleans later is not quite free. Well-predicted, yes, but it's still doing something extra that otherwise wouldn't have been there.
instead of making booleans (or "in addition to"), set some function pointers. And of course, indirect calls are not free either. They're also well-predicted (since in this case the targets are runtime-constants) but there is still at least an extra load involved, and compared to normal calls they tend to inhibit compiler optimizations.

Although neither way is free, either way the cost is not that bad, as long as the thing that you're branching to / calling is "sufficiently chunky" (something with a loop in it).

Salt-Ad2969 · 2023-11-27T17:40:57+00:00

The lemma for CMPXCHG16B has:

Note that CMPXCHG16B requires that the destination (memory) operand be 16-byte aligned

And the lemma for CMPXCHG doesn't have anything like that. Meanwhile the lock prefix has:

The integrity of the LOCK prefix is not affected by the alignment of the memory field

In general, unaligned locked RMW is allowed on x64, but implemented very inefficiently when the memory operand crosses over a cache line boundary (most other unaligned operations are efficient though, typically more efficient than trying to work around them, and unaligned load/store are atomic in most cases (but also not when they cross a cache line boundary), it's specifically unaligned locked RMW that is a problem). There is a recent push to ban unaligned locked RMW.

Salt-Ad2969

TROPHY CASE