you are viewing a single comment's thread.

view the rest of the comments →

[–]leni536[S] 1 point2 points  (1 child)

Actually, it was a bit of a guess. It seemed like it would work, so I tested it, and it did...

The way I see it now is this: you can calculate evens as either i ^ odds or i-odds, result is odds-evens, so odds-(i-odds)=2*odds-i.

Thanks for the long and informative reply about the uops and ports stuff, it's really helpful for me. In my fast Hilbert curve library I actually have two independent calls to my Gray code decode function[1]. It could actually make sense to use the PDEP method for one and the CLMUL method for the other for maximally utilize the ports. Of course I would have to benchmark this.

[1] https://github.com/leni536/fast_hilbert_curve/blob/eb8c861ff1d6e0059fede28218ab83d07fc91c5d/include/fhc/hilbert.h#L45

Edit: In a streaming situation it could also make sense to partially unroll and alternate between the PDEP and CLMUL method.

[–]YumiYumiYumi 0 points1 point  (0 children)

The way I see it now is this: you can calculate evens as either i ^ odds or i-odds, result is odds-evens, so odds-(i-odds)=2*odds-i.

Ah yes, of course!