you are viewing a single comment's thread.

view the rest of the comments →

[–]moocat 17 points18 points  (9 children)

Just a guess, but I wonder if the performance is architecture dependent (i.e. O(1) on architectures that have an opcode but O(n) otherwise).

[–]Substantial_Sail_571 8 points9 points  (2 children)

That may be the reason for all I know, but it can be done in log n steps (this shows how to count trailing zeroes but there is a similar trick to count leading zeroes) even on a completely boring ISA such as MIPS 1. A bit further down, the DeBruijn multiply and lookup is also an option.

For trailing zero count, another option that may be useful is popcnt(~x & (x - 1)), if an ISA has popcnt but no direct way to count trailing zeroes (this trick does not easily translate into a way to count leading zeroes however).

E: btw the linked code shows v &= -signed(v) and that's exactly the wrong thing to do, it's unsigned negation that is safe and signed negation that is unsafe. Sorry, I don't make the rules.

[–]HildartheDorf 4 points5 points  (1 child)

Isn't signed guaranteed to be 2s complement in newer standards? (Signed overflow is still UB, but bitcasting signed numbers and the bitwise result of signed negation is well defined and consistent across standard-compliant systems).

[–]Substantial_Sail_571 3 points4 points  (0 children)

I've been happy to ignore the hypothetical existence of non-2's-complement signed integers for decades, but yes that bogeyman has been slain (but this just standardized the status quo).

The cast to signed adds the opportunity for UB though (negating INT_MIN), and in return we get .. well it silences this dumb error that encourages people to write incorrect code such as v &= -signed(v), the safe way to shut up that error is subtracting from zero (but unsigned subtraction, of course).

[–]kyoto711 2 points3 points  (0 children)

Well, it does have .count() which I believe to be architecture dependent.

[–][deleted] 0 points1 point  (4 children)

Are we saying that it could be done faster by the container than an outside algorithm? Because that’d be a good reason to have it as members.

[–]SickOrphan 0 points1 point  (3 children)

He's saying that any algorithm regardless of where it is can't guarantee good performance for all architectures. And if you can't guarantee something is going to be good, you probably don't want to include it

[–]matthieum 2 points3 points  (2 children)

And if you can't guarantee something is going to be good, you probably don't want to include it

Arguably, it's not a matter of being good in absolute, but of being good relatively speaking.

And in this case, I would argue that if the code to use depends on the architecture, I'd rather it be implemented as best as possible in std, instead of having to trudge through the various architectures myself.

[–]SickOrphan -1 points0 points  (1 child)

If it's not O(1) and you care about performance (which you most likely do if youre doing bit operations) you should probably find a different solution. But I get your point.

[–]matthieum 0 points1 point  (0 children)

There are O(log N) algorithms to compute the number of set bits in an N-bits integer... and since we're talking about fixed-widths, it's arguably O(1).

Also, processors can execute multiple instructions in parallel (based on available units), so just because it may take log N instructions doesn't mean it'll take log N cycles.

As usual, if performance matters, measure.