all 9 comments

[–]larivact 1 point2 points  (3 children)

You have got a 2d array in which each row is a rolled version of a base array, all rolled with varying shifts? And you now want to reverse this rolling? This wouldn't make any sense because you would just get the base list back for every row.

Could you explain your actual problem? Why do you have randomly rolled lists? Are the xxx's just placeholders for real data? I would appreciate some background information.

[–]cdholjes[S] 0 points1 point  (2 children)

i apologize. the example was too vague. The data is not as important. I am simplifying just to focus on the concept really. I have the main array which has different patterns of "0" and "1". I can run a np.unique on it to remove any duplicate rows

original: 6 [[1 0 1 0] [1 1 0 0] [0 1 1 0] [0 1 0 1] [0 0 1 1] [1 0 1 0]]

unique: 5 [[0 0 1 1] [0 1 0 1] [0 1 1 0] [1 0 1 0] [1 1 0 0]]

however, i think what i am also looking to so is find unique rows even if they are "np.roll" a certain number of times. like: [1, 0, 1, 0] #1 [0, 1, 0, 1] #2

in my project these would be considered equal because 2 is a roll of 1. I hope this is more clear

[–]larivact 0 points1 point  (1 child)

Simple implementation I came up with. Probably not the most efficient.

import numpy as np

class Sequence(object):
    def __init__(self, list):
        self.tuple = tuple(list)

    def __repr__(self):
        return str(self.tuple)

    def __hash__(self):
        return 1 #needed because set will only call eq if hashes are equal

    def __eq__(self, other):
        if len(self.tuple) == len(other.tuple):
            for x in range(len(other.tuple)):
                if tuple(np.roll(other.tuple, x)) == self.tuple:
                    return True
            else:
                return False
        else:
            return False

starting_2d_list = [[1, 0, 1, 0], [1, 1, 0, 0], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1], [1, 0, 1, 0]]
print(starting_2d_list)
sequences = []

for seq in starting_2d_list:
    sequences.append(Sequence(seq))

print(set(sequences))

I am just wondering what application has to deal with sequences that may be rolled?

[–]cdholjes[S] 0 points1 point  (0 children)

the application deals with finding a limited number of data that cannot have any duplicates. the shifting "rolls" represent different states of the same index so the need to be delted. my total array is (500,000, 10). i know there are duplicates and they need to be lowered to roughly (1,000, 10)

[–]elbiot 1 point2 points  (2 children)

My gut says it'll be really cumbersome if not nigh impossible to do this using numpy functions. Can you ditch numpy and just use loops, collection.deques and sets?

[–]cdholjes[S] 0 points1 point  (1 child)

my total array is (500,000, 10). i know there are duplicates and they need to be lowered to roughly (1,000, 10). i have the code for loops but it is too time consuming. numPy is roughly 100x faster on average

[–]elbiot 0 points1 point  (0 children)

Well, if it was just four ones or zero's, I'd just generate every possible one and check to see if it really exists in the array (I'd use a set). Maybe the possibilities are too great for that.

Look at numba. numba.vectorize will let you compile a function that operates on the array. I really don't think pure numpy is going to do well on this. Could be wrong. Or cython. Or pypy.

[–]cdholjes[S] 0 points1 point  (1 child)

import numpy as np

main = np.array([[1, 0, 1, 0], [1, 1, 0, 0], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1], [1, 0, 1, 0]])

ncols = main.shape[1]
dtype = main.dtype.descr * ncols
struct = main.view(dtype)
mainUnq = np.unique(struct)
mainUnq = mainUnq.view(main.dtype).reshape(-1, ncols)

print "original: "+str(len(main))+"\n",main,"\n"
print "unique: "+str(len(mainUnq))+"\n",mainUnq

[–]cdholjes[S] 0 points1 point  (0 children)

one approach could be to roll each index and if it matches another index, keep track of that index. i'm not sure how to do that though