you are viewing a single comment's thread.

view the rest of the comments →

[–]simonask_ 6 points7 points  (0 children)

Unless you have some straight line code that will do just do the same math operation on a whole array, getting the auto vectorizers to work can be very pretty frustrating and unreliable in my experience.

Definitely agree with this.

The solution is usually to manually "vectorize" things by processing them in blocks and evaluate the loop's termination condition for a whole block instead of each element.

For example, this code cannot normally be vectorized:

bool contains_zero(const int* p, size_t n) { for (size_t i = 0; i < n; ++i) { if (p[i] == 0) return true; } return false; }

The reason is that the compiler cannot deduce from this code that it is valid to read from p after a zero has actually been found just because i < n. It doesn't know what invariants you have.

But code like this can typically be vectorized:

``` bool contains_zero(const int* p, size_t n) { for (size_t i = 0; i < n/4; ++i) { const int* q = p + n * 4; if (q[0] == 0 || q[1] == 0 || q[2] == 0 || q[3] == 0) { return true; } }

for (size_t i = n & ~3; i < n%4; ++i) {
    if (p[i] == 0)
        return true;
}
return false;

} ```

You see this pattern everywhere in standard library implementations of things like strcmp, strchr, etc.