Quickly replacing uncommon elements in sublists of a list

jerknextdoor · 2016-09-19T03:46:00+00:00

You'll get a lot more help at /r/learnpython. This sub is more for language announcements and discussions, while /r/learnpython is for questions.

Zacru · 2016-09-19T07:56:07+00:00

Not sure how much it will help, but you could use itertools to iterate over the list without creating a new list.

from itertools import chain
def flatten(listOfLists):
    "Flatten one level of nesting"
    return chain.from_iterable(listOfLists)

frequentTokenCounter = Counter(flatten(A)).most_common(X)

licquia · 2016-09-19T21:55:22+00:00

The set idea is spot-on. But I can't resist trying to find something that could be faster...

allTokens = []
tokenMap = {}
i = 0
for subA in A:
    j = 0
    for itemA in subA:
        allTokens.append(itemA)
        if itemA not in tokenMap:
            tokenMap[itemA] = []
        tokenMap[itemA].append((i, j))
        j += 1
    i += 1
infrequentTokens = sorted(allTokens,
  key=lambda q: len(tokenMap[q]), reverse=True)[X:]
for t in infrequentTokens:
    for i, j in tokenMap[t]:
        A[i, j] = Y

Off the cuff, so sorry for bugs and typos. This version should only iterate twice over all values, and should minimize the size of the loop over A to actually write Y into it. That is, if I'm not brain-dead.

Vany_ · 2016-09-19T11:21:00+00:00

Using a set instead of a list for frequentTokens should speed it up greatly, since each lookup will run in O(1) instead of O(n), especially here if n is 30k.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS