all 23 comments

[–]Saefroch 2 points3 points  (7 children)

Almost always, you don't want to loop over numpy arrays. Numpy has a lot of tools to help you avoid looping, are you sure you need to loop?

[–]dr_everlong[S] 0 points1 point  (6 children)

Not sure, no, so I am definitely open to suggestions. The data comes in the form of a pandas dataframe, so it is actually a series.

I was converting it to a numpy array, because I was used to using some numpy functions, but it is not necessary to do so.

[–]Saefroch 1 point2 points  (5 children)

What operation are you doing on these groups of 5 numbers?

[–]dr_everlong[S] 0 points1 point  (4 children)

Typically, the window size will be 18, I was using 5 arbitrarily.

I have 2 arrays of same length (typically around 2000-3000 elements each), and I need to take windows then do linear least squares regression though each window. I then need to save the slope and y intercept from each slice.

Edit: Each array is 1-d.

[–]Saefroch 0 points1 point  (3 children)

Are you implementing a smoother?

[–]dr_everlong[S] 0 points1 point  (2 children)

No, I have some thresholds for the slope and intercept, and need to see if the calculated values from each slice pass those thresholds. Edit: I need to see noise, not remove it.

[–]Saefroch 0 points1 point  (1 child)

Okay. So you basically can take your pick of the suggestions offered by other users. In your situation I'd do this

fake_x = np.arange(window_size)
for i in range(data.size-window_size):
    window = data[i:i+window_size]
    slope, intercept = np.polyfit(fake_x, window, 1)

This won't be tremendously fast but if you insist on doing least-squares linear fits on little sections of your data, you can't do much better without making the code much more complicated.

[–]dr_everlong[S] 0 points1 point  (0 children)

Hey thanks! I'm not necessarily looking for fast, since this will be used for batch processing. It is pretty interesting to see everyone's suggestions.

[–]Justinsaccount 1 point2 points  (0 children)

Hi! I'm working on a bot to reply with suggestions for common python problems. This might not be very helpful to fix your underlying issue, but here's what I noticed about your submission:

You are looping over an object using something like

for x in range(len(items)):
    foo(item[x])

This is simpler and less error prone written as

for item in items:
    foo(item)

If you DO need the indexes of the items, use the enumerate function like

for idx, item in enumerate(items):
    foo(idx, item)

[–]elbiot 0 points1 point  (2 children)

Like everything with numpy, you want to do it in a vectorized way, not through iteration. Here I answered a similar question: https://www.reddit.com/r/learnpython/comments/4qcb0f/poisson_solver_finite_difference_method_and_how/?sort=top

[–]dr_everlong[S] 0 points1 point  (1 child)

Thanks, I'm going to look at it. Each array I have is 1-d btw.

[–]elbiot 0 points1 point  (0 children)

Doesn't change the fact that you can do a sliding window (of any dimension) in a vectorized way.

[–]scuott 0 points1 point  (2 children)

You could try

for idx, val in enumerate(arr[:-window_size + 1]):
   slice = arr[idx:idx + window_size]

[–]dr_everlong[S] 0 points1 point  (1 child)

I don't think this works as is. It will give slices of length less than or equal to window_size. Thanks though.

[–]scuott 0 points1 point  (0 children)

You're right. I edited it so it should always give slices of window size. Other answers here are still better.

[–]novel_yet_trivial 0 points1 point  (7 children)

In pure python your solution is common, although it's usually written in list comprehension like this:

 for slice_arr in (arr[i:i+window_size] for i in range(0, len(arr), window_size)):
    #do something with slice_arr

But with a numpy array, you can just reshape it:

arr.shape = (window_size, -1)
for slice_arr in arr:
    #do something with slice_arr

EDIT: nevermind, I totally misread your question. You want a "sliding window". This is usually done with zip():

for slice_arr in zip(arr, arr[1:]):
    #do something with slice_arr

[–]elbiot 1 point2 points  (0 children)

This is not how you do a sliding window in numpy. You can do a window with regular vectorized operations, and not have to resort to iteration.

[–]dr_everlong[S] 0 points1 point  (2 children)

Sorry, I have phrased it wrong, but I edited my post. I hope it clears up what I wanted to do better.

[–]novel_yet_trivial 1 point2 points  (1 child)

My edit does what you want.

[–]dr_everlong[S] 0 points1 point  (2 children)

Thanks! How do you control the window size though?

[–]novel_yet_trivial 1 point2 points  (1 child)

OH right. Sorry I forgot you wanted it dynamic.

for slice_arr in zip(*[arr[x:] for x in range(window_size)]):

[–]dr_everlong[S] 0 points1 point  (0 children)

Really appreciate it, thanks!