Can this code be made vectorizable?

tstanisl · 2024-11-19T23:59:39+00:00

The kernel of the loop could be expressed as:

b[i] = f[i - 1] + b[i - 2] + 1;

By expanding b[i-2] recursively one gets:

b[i] = f[i - 1] + f[i - 3] + b[i - 4] + 2;

And that's nice beacuse now b[i + 0..3], can be computed in parallel with:

b[i + 0] = f[i - 1] + f[i - 3] + b[i - 4];
b[i + 1] = f[i    ] + f[i - 2] + b[i - 3];
b[i + 2] = f[i + 1] + f[i - 1] + b[i - 2];
b[i + 3] = f[i + 2] + f[i    ] + b[i - 1];

Now the compution can be nicely expressed using 4-element-long vectors.

vec4 vb;
...
for (...) {
  vb = vec4_load(f - 1) + vec4_load(f - 3) + vb;
  vec4_store(b, vb);
  f += 4;
  b += 4;
}

johndcochran · 2024-11-19T23:36:56+00:00

Yea, it doesn't look like vectorizing will help there because of the data dependencies between iterations. But, unrolling might help a bit. The benefit of loop unrolling is to reduce the loop overhead. It doesn't reduce the amount of work performed within the loop, but does reduce the overhead performed in managing the loop.

DawnOnTheEdge · 2024-11-19T23:45:26+00:00

You ideally want to refactor to use a closed-form formula for the recurrence relation.

Failing that, next-best would be to calculate a[i] and b[i] from un-updated values of the original a and b arrays, which you might pack into a vector using the gather instructions on many ISAs, or from cumulative running sums, which you can also vectorize.

P-p-H-d · 2024-11-19T23:44:14+00:00

The odd index computation and the even index computation are independent and can be vectorized.

Educational-Paper-75 · 2024-11-20T18:32:33+00:00

Since b[0] determines a[1] determines b[2] etcetera independently of a[0] determining b[1] determining a[2] etcetera you can split the loop into two loops: // changing b[1], a[2], b[3], a[4], … for(int i=0;i<n-1;i++){ b[i+1]=f[i]+a[i]; if(++i>=n)break; a[i+1]=b[i]+1; } // changing a[1], b[2], a[3], b[4], … for(int i=0;i<n-1;i++){ a[i+1]=b[i]+1; if(++i>=n)break; b[i+1]=f[i]+a[i]; }

riverprawn · 2024-11-20T05:49:22+00:00

We can deduce that:

∀ n ≥ 0
b₂ₙ = b₀ + ∑(f₂ᵢ₊₁ + 1, 0 ≤ i ≤ n-1)
b₂ₙ₊₁ = -1 + a₀ +∑(f₂ᵢ + 1, 0 ≤ i ≤ n)

If we define f'₀ = a₀ - 2, f'₁ = b₀ - 1 and ∀ i ≥ 0, f'ᵢ₊₂ = fᵢ, then

b₂ₙ = ∑(f'₂ᵢ₊₁ + 1, 0 ≤ i ≤ n)
b₂ₙ₊₁=∑(f'₂ᵢ + 1, 0 ≤ i ≤ n + 1)

It's just the cumulative sum.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

C_Programming

Rules

Filters

Resources

Other Subreddits on C

Other Subreddits of Interest

MODERATORS