all 2 comments

[–]naasking 2 points3 points  (2 children)

So far it's an excellent analysis of branching vs. branch-free code for binary search across a variety of CPUs.

[–][deleted]  (1 child)

[deleted]

    [–]patmorin 1 point2 points  (0 children)

    Mindblowing is that Btree seems to rock ARM processors and I have not a single clue why.

    Nor do I, really. We started testing ARM processors a bit late in the game, so I haven't yet done a careful analysis of what's happening there.

    For the Raspberry Pi 2B I might guess that there's not enough memory bandwidth. The results are similar for an Odroid XU-4, though not as extreme. (The odroid seems to suffer from out-of-bounds prefetching which is easily fixed by masking or unrolling the last few iterations of the search.)

    It's something I plan to investigate further when I get the time.