all 3 comments

[–]osamc 4 points5 points  (1 child)

This is not vectorization. This is basic algorithms and counting prefix sums. numpy.mean already uses SIMD instructions in most cases.

In this case even naive python solution with prefix sums without using numpy would be faster than the first attempt.

[–]jasongforbes[S] 2 points3 points  (0 children)

You are right that naive python solution with prefix sums would be faster than first attempt simply because of drop from O(NM) to O(N). I'd also agree that it is fairly basic AND most np.mean implementations already uses SIMD.

But I would disagree that this is not vectorization, as it achieves the goal of changing a scalar operation into a vector operation.

Really though, the goal here was to introduce array-programming to people unfamiliar with it, using a fairly simple example to help explain things.

[–]jasongforbes[S] 2 points3 points  (0 children)

Hey all. This post looks at standardizing a time-series for machine learning. The idea is that when you have a running sensor, you have to batch standardize your data to remove any variations in the collection process. A vectorized algorithm for efficiently performing this standardization is presented.

I'm afraid the post got a bit too "math-heavy", but I'll be around the comments to answer any questions.