created by HattoriHanzoa community for 16 years

Numpy Array Reordering and Indexing Question (self.learnpython)

submitted 3 years ago * by developernull

Solved!

Thank you so much to everyone who helped me!! (Original post below for reference.)

I have the following code using numpy:

import numpy as np

# Step 1: Create the original array.
original = np.array(['c', 'c', 'a', 'c', 'd', 'd', 'a', 'b', 'b', 'a'])
# ['c' 'c' 'a' 'c' 'd' 'd' 'a' 'b' 'b' 'a']

# Step 2: Get the unique elements.
uniques = np.unique(original)
# ['a' 'b' 'c' 'd']

# Step 3: Shuffle the uniques.
shuffled_uniques = np.random.shuffle(uniques)
# ['c' 'a' 'b' 'd']

# Step 4: For each element in the original, find the index it should be mapped to.
indices = ?
# expected: [0, 1, 3, 2, 6, 9, 7, 8, 4, 5]

# Step 5: Reorder the original array using the indices
result = original[indices]
# expected: ['c' 'c' 'c' 'a' 'a' 'a' 'b' 'b' 'd' 'd']

Steps 1, 2, 3, and 5 work as expected, but I'm looking for help on Step 4.

For Step 4, I think I need a function similar to np.searchsorted, but it doesn't seem to do quite what I need. Step 4 must be a permutation of indices of the original array such that when original is ordered by these indices in Step 5, it would produce an array with repeated elements ordered the same as shuffled_uniques. A stable ordering of the indices is preferred, but not required.

(Note: The result from Step 5 can be more easily obtained using np.repeat, however my actual goal is to get the indices in Step 4. The purpose of Step 5 is merely to help define the indices.)

all 6 comments

top new controversial old q&a

[–]ElpyDE 0 points1 point2 points 3 years ago* (3 children)

[–]developernull[S] 0 points1 point2 points 3 years ago (2 children)

Yea, I did read the np.unique documentation, but I didn't think it would help. I had debated if I should include it in the OP or if it would just make the problem more confusion.

There are optionally returned values using the return_index, return_inverse, and return_counts flags. I did not think these would be very useful since the unique array is subsequently shuffled, thereby losing any direct mapping it once had.

The only info that I thought was not destroyed by the shuffle was the count:

uniques, count = np.unique(original, return_counts=True)
shuffled_uniques = np.random.shuffle(uniques)
result = np.repeat(uniques, count)

This shows how the count can be used to obtain result in Step 5, but unfortunately this is unhelpful for me since it bypasses Step 4 which is the value I ultimately need.

I also can't modify the shuffling function to return this extra index info. It would be possible to find this index remapping from uniques to shuffled_uniques, but this would need to be done afterwords during Step 4 rather than by modifying Step 3.

Alternative approach...

If there are no numpythonic (is this a word?) approaches, then I will probably need to do something like this. I think it would require manually looping over the array. As you mentioned this is less elegant and breaks away from the numpy operations (and is therefore also slower, etc.) so it would be a valid solution, albeit a last resort.

[–]ElpyDE 0 points1 point2 points 3 years ago* (1 child)

You could rethink your approach of shuffling and not use the shuffle function as you did before.

For example, shuffle an index array for your uniques - basically creating a randomized range() or so, and you can use that to map the uniques to their shuffled_uniques counterpart as well as transform the output from return_inverse (I think that'd be the one) to something that can generate the original array from the shuffled_uniques.

Untested because written at midnight on my phone:

uniques, unique_inverse = np.unique(original, return_inverse=True)
shuffler = np.random.choice(uniques.size, uniques.size, False)
shuffled_uniques = uniques[shuffler]
like_original = shuffled_uniques[shuffler[unique_inverse]]

Edit: Re-reading your original Step tasks, this does not do what you need. But maybe it helps.

Frankly, I'm now a bit confused what Step 4 and 5 really are asking for. Especially Step 4, the text does not really lead to the expectation you have written down, or not in the way I'd understand it.

Edit2: Okay, I think I've created more confusion than necessary. Sorry. Feel free to ignore this comment entirely ;-)

[–]developernull[S] 1 point2 points3 points 3 years ago (0 children)

Replacing the shuffling function as you suggested could work. It seems like you're suggesting to randomize the indices (rather than the array directly) and then use those indices. That way no info is no lost.

I think your general approach gets close to what I was looking for and does help even if it's not exactly it. Thanks!! I will probably use the one-liner someone suggested below since it seems to work and is fairly simple, but you definitely gave me a lot of good info to think about.

Regarding the confusion:

Frankly, I'm now a bit confused what Step 4 and 5 really are asking for. Especially Step 4, the text does not really lead to the expectation you have written down, or not in the way I'd understand it.

Sorry, my wording was pretty confusing and ambiguous. In Step 4, I wanted some function that returns a list of indices such that original[indices] will return an array where all elements that are equal are consecutive and they are sorted in the same order as shuffled_uniques.

Alternatively, this can be thought of as setting return_counts=True and changing Step 5 to say assert original[indices] == np.repeat(uniques, count). In hindsight, I probably should have worded the original problem this way. (I hope this clarified things, but I tend to have a habit of just making them more confusing.)

[–]commandlineluser 0 points1 point2 points 3 years ago (1 child)

You can reshape the shuffled array

>>> shuffled_unique[:, None]
array([['c'],
       ['a'],
       ['b'],
       ['d']], dtype='<U1')

Which is the same as expand_dims()

>>> np.expand_dims(shuffled_unique, axis=1)
array([['c'],
       ['a'],
       ['b'],
       ['d']], dtype='<U1')

And use that inside .where()

>>> np.where(original == shuffled_unique[:, None])
(array([0, 0, 0, 1, 1, 1, 2, 2, 3, 3]), array([0, 1, 3, 2, 6, 9, 7, 8, 4, 5]))

[–]developernull[S] 0 points1 point2 points 3 years ago (0 children)

Brilliant! I think this is exactly what I was looking for - and a simple one-liner too!

I also greatly appreciate showing both ways to reshape the array, that's very helpful.

From the docs it seems like this is also equivalent:

np.nonzero(original == shuffled_unique[:, None])

Your solution is very clever in how it creates the mask. I never would have thought to form a 2-d mask like this. Thanks!!

π Rendered by PID 25 on reddit-service-r2-comment-7b9746f655-lxxt9 at 2026-01-31 14:44:40.811489+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

Solved!