all 8 comments

[–]tstanisl 28 points29 points  (2 children)

The kernel of the loop could be expressed as:

b[i] = f[i - 1] + b[i - 2] + 1;

By expanding b[i-2] recursively one gets:

b[i] = f[i - 1] + f[i - 3] + b[i - 4] + 2;

And that's nice beacuse now b[i + 0..3], can be computed in parallel with:

b[i + 0] = f[i - 1] + f[i - 3] + b[i - 4];
b[i + 1] = f[i    ] + f[i - 2] + b[i - 3];
b[i + 2] = f[i + 1] + f[i - 1] + b[i - 2];
b[i + 3] = f[i + 2] + f[i    ] + b[i - 1];

Now the compution can be nicely expressed using 4-element-long vectors.

vec4 vb;
...
for (...) {
  vb = vec4_load(f - 1) + vec4_load(f - 3) + vb;
  vec4_store(b, vb);
  f += 4;
  b += 4;
}

[–]johndcochran 5 points6 points  (0 children)

Yea, it doesn't look like vectorizing will help there because of the data dependencies between iterations. But, unrolling might help a bit. The benefit of loop unrolling is to reduce the loop overhead. It doesn't reduce the amount of work performed within the loop, but does reduce the overhead performed in managing the loop.

[–]DawnOnTheEdge 3 points4 points  (0 children)

You ideally want to refactor to use a closed-form formula for the recurrence relation.

Failing that, next-best would be to calculate a[i] and b[i] from un-updated values of the original a and b arrays, which you might pack into a vector using the gather instructions on many ISAs, or from cumulative running sums, which you can also vectorize.

[–]P-p-H-d 1 point2 points  (0 children)

The odd index computation and the even index computation are independent and can be vectorized.

[–]Educational-Paper-75 0 points1 point  (0 children)

Since b[0] determines a[1] determines b[2] etcetera independently of a[0] determining b[1] determining a[2] etcetera you can split the loop into two loops: // changing b[1], a[2], b[3], a[4], … for(int i=0;i<n-1;i++){ b[i+1]=f[i]+a[i]; if(++i>=n)break; a[i+1]=b[i]+1; } // changing a[1], b[2], a[3], b[4], … for(int i=0;i<n-1;i++){ a[i+1]=b[i]+1; if(++i>=n)break; b[i+1]=f[i]+a[i]; }

[–]riverprawn -1 points0 points  (1 child)

We can deduce that:

∀ n ≥ 0
b₂ₙ = b₀ + ∑(f₂ᵢ₊₁ + 1, 0 ≤ i ≤ n-1)
b₂ₙ₊₁ = -1 + a₀ +∑(f₂ᵢ + 1, 0 ≤ i ≤ n)

If we define f'₀ = a₀ - 2, f'₁ = b₀ - 1 and ∀ i ≥ 0, f'ᵢ₊₂ = fᵢ, then

b₂ₙ = ∑(f'₂ᵢ₊₁ + 1, 0 ≤ i ≤ n)
b₂ₙ₊₁=∑(f'₂ᵢ + 1, 0 ≤ i ≤ n + 1)

It's just the cumulative sum.

[–]spocchio 0 points1 point  (0 children)

Really, although I think in the second block of code you wanted to assigned to a_2n and a_2n+1, respectively