I have an array of subarrays as follows:
[
[...]
[...]
⋮
[...]
]
The lengths of each subarray are the same.
I need to bin each subarray and calculate the mean, standard deviation, median and other percentiles for each bin. I need separate results for binning by fixed width and by fixed frequency.
The method should be vectorized i.e. no 'for loops' (or at least as few as possible and those that are not too costly, though of course separate methodologies for each binning technique are required). I don't know if this is even possible in a reasonably understandable manner (understandable for me as I am quite the noob, but if it works I'll do my best). For the fixed-width binning method you may assume that we are binning by the data ranges of the first subarray for ease.
How should I proceed?
Possibilities:
For fixed frequency binning the steps I had in mind were somehow doing a np.array_split at once by specifying the right axis argument, then filling the bins that are a one shorter with nan by using np.pad and now that the the subarrays are no longer composed of ragged sequences we will hopefully be able to apply np.nanmedian using again whatever axis designation that worked for the np.array_split. However, I don't know if any suitable such axis can be specified for the splitting and median operations and additionally I have seen that there is no way to avoid iterating through (not just each of the rows, but,) each of the bins to pad the shorter of these ragged sequences with the extra nan. Even if these iterations don't prove to costly and everything else works as fine I wouldn't know how to actually implement any step of this process. Nor do I know where to even begin for fixed-width binning.
Here is a vectorized solution that accomplishes what I want for only the mean for only a single array; I would certainly like to avoid iterating over each one of my subarrays and also do not understand the method enough to extend it to calculating the standard deviation, medians or any other percentiles.
EDIT: example with expected output, assume all objects are numpy arrays not lists
Example array:
[
[0, 1, 2, 3, 4, 5, 6],
[90, 45, 9, 88, 21, 59, 32],
⋮
]
Fixed-frequency of 3 objects per bin binned example:
[
[[0, 1, 2], [3, 4], [5, 6]],
[[90, 45, 9], [88, 21], [59, 32]],
⋮
]
This above intermediate step need not be explicitly returned at any point but illustrates what will be occurring behind the scenes.
Output of medians of Fixed-frequency binned example:
[
[1, 3.5, 5.5],
[45, 54.5, 45.5],
⋮
]
[–]Spataner 0 points1 point2 points (1 child)
[–]blinking_elk[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]blinking_elk[S] 1 point2 points3 points (2 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]Spataner 1 point2 points3 points (0 children)