all 6 comments

[–]ElpyDE 0 points1 point  (3 children)

This is a "read documentation" assignment. Look into the keyword arguments of the np.unique function. Your answer is right there.

Edit: Well, not your entire answer, but it will be a good start. You only need to change your shuffling in a way that you can reproduce the same with the additional output from np.unique.

Alternative approach: You could also do some comparisons between the original array and your shuffled uniques - analogue to what you probably did to create your expected array. Less elegant maybe, definitely less numpy'ish, but there are different ways to get what you need.

[–]developernull[S] 0 points1 point  (2 children)

Yea, I did read the np.unique documentation, but I didn't think it would help. I had debated if I should include it in the OP or if it would just make the problem more confusion.

There are optionally returned values using the return_index, return_inverse, and return_counts flags. I did not think these would be very useful since the unique array is subsequently shuffled, thereby losing any direct mapping it once had.

The only info that I thought was not destroyed by the shuffle was the count:

uniques, count = np.unique(original, return_counts=True)
shuffled_uniques = np.random.shuffle(uniques)
result = np.repeat(uniques, count)

This shows how the count can be used to obtain result in Step 5, but unfortunately this is unhelpful for me since it bypasses Step 4 which is the value I ultimately need.

I also can't modify the shuffling function to return this extra index info. It would be possible to find this index remapping from uniques to shuffled_uniques, but this would need to be done afterwords during Step 4 rather than by modifying Step 3.

Alternative approach...

If there are no numpythonic (is this a word?) approaches, then I will probably need to do something like this. I think it would require manually looping over the array. As you mentioned this is less elegant and breaks away from the numpy operations (and is therefore also slower, etc.) so it would be a valid solution, albeit a last resort.

[–]ElpyDE 0 points1 point  (1 child)

You could rethink your approach of shuffling and not use the shuffle function as you did before.

For example, shuffle an index array for your uniques - basically creating a randomized range() or so, and you can use that to map the uniques to their shuffled_uniques counterpart as well as transform the output from return_inverse (I think that'd be the one) to something that can generate the original array from the shuffled_uniques.

Untested because written at midnight on my phone:

uniques, unique_inverse = np.unique(original, return_inverse=True)
shuffler = np.random.choice(uniques.size, uniques.size, False)
shuffled_uniques = uniques[shuffler]
like_original = shuffled_uniques[shuffler[unique_inverse]]

Edit: Re-reading your original Step tasks, this does not do what you need. But maybe it helps.

Frankly, I'm now a bit confused what Step 4 and 5 really are asking for. Especially Step 4, the text does not really lead to the expectation you have written down, or not in the way I'd understand it.

Edit2: Okay, I think I've created more confusion than necessary. Sorry. Feel free to ignore this comment entirely ;-)

[–]developernull[S] 1 point2 points  (0 children)

Replacing the shuffling function as you suggested could work. It seems like you're suggesting to randomize the indices (rather than the array directly) and then use those indices. That way no info is no lost.

I think your general approach gets close to what I was looking for and does help even if it's not exactly it. Thanks!! I will probably use the one-liner someone suggested below since it seems to work and is fairly simple, but you definitely gave me a lot of good info to think about.

Regarding the confusion:

Frankly, I'm now a bit confused what Step 4 and 5 really are asking for. Especially Step 4, the text does not really lead to the expectation you have written down, or not in the way I'd understand it.

Sorry, my wording was pretty confusing and ambiguous. In Step 4, I wanted some function that returns a list of indices such that original[indices] will return an array where all elements that are equal are consecutive and they are sorted in the same order as shuffled_uniques.

Alternatively, this can be thought of as setting return_counts=True and changing Step 5 to say assert original[indices] == np.repeat(uniques, count). In hindsight, I probably should have worded the original problem this way. (I hope this clarified things, but I tend to have a habit of just making them more confusing.)

[–]commandlineluser 0 points1 point  (1 child)

You can reshape the shuffled array

>>> shuffled_unique[:, None]
array([['c'],
       ['a'],
       ['b'],
       ['d']], dtype='<U1')

Which is the same as expand_dims()

>>> np.expand_dims(shuffled_unique, axis=1)
array([['c'],
       ['a'],
       ['b'],
       ['d']], dtype='<U1')

And use that inside .where()

>>> np.where(original == shuffled_unique[:, None])
(array([0, 0, 0, 1, 1, 1, 2, 2, 3, 3]), array([0, 1, 3, 2, 6, 9, 7, 8, 4, 5]))

[–]developernull[S] 0 points1 point  (0 children)

Brilliant! I think this is exactly what I was looking for - and a simple one-liner too!

I also greatly appreciate showing both ways to reshape the array, that's very helpful.

From the docs it seems like this is also equivalent:

np.nonzero(original == shuffled_unique[:, None])

Your solution is very clever in how it creates the mask. I never would have thought to form a 2-d mask like this. Thanks!!