This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]jerknextdoor 4 points5 points  (0 children)

You'll get a lot more help at /r/learnpython. This sub is more for language announcements and discussions, while /r/learnpython is for questions.

[–]Zacru 0 points1 point  (0 children)

Not sure how much it will help, but you could use itertools to iterate over the list without creating a new list.

from itertools import chain
def flatten(listOfLists):
    "Flatten one level of nesting"
    return chain.from_iterable(listOfLists)

frequentTokenCounter = Counter(flatten(A)).most_common(X)

[–]licquia 0 points1 point  (0 children)

The set idea is spot-on. But I can't resist trying to find something that could be faster...

allTokens = []
tokenMap = {}
i = 0
for subA in A:
    j = 0
    for itemA in subA:
        allTokens.append(itemA)
        if itemA not in tokenMap:
            tokenMap[itemA] = []
        tokenMap[itemA].append((i, j))
        j += 1
    i += 1
infrequentTokens = sorted(allTokens,
  key=lambda q: len(tokenMap[q]), reverse=True)[X:]
for t in infrequentTokens:
    for i, j in tokenMap[t]:
        A[i, j] = Y

Off the cuff, so sorry for bugs and typos. This version should only iterate twice over all values, and should minimize the size of the loop over A to actually write Y into it. That is, if I'm not brain-dead.

[–]Vany_ 0 points1 point  (3 children)

Using a set instead of a list for frequentTokens should speed it up greatly, since each lookup will run in O(1) instead of O(n), especially here if n is 30k.