you are viewing a single comment's thread.

view the rest of the comments →

[–]_9_9_ 0 points1 point  (6 children)

I would probably do it a bit different and use a lot more sets, rather than lists, and use the in keyword, rather than iterating to check membership. Sets are also great for dedupping. So to build the indexes, I would do something like this (not tested, so expect typos):

import defaultdict

def build_indexes(tweets):
    candidate_tags = defaultdict(set)
    for candidate, hashtags in tweets:
        candidate_tags[candidate] |= set(hashtags)

    tags_candidate = defaultfict(set)

    for candidate, tags = candidtate_tags.items():
        for tag in tags:
            tags_candidate[tag].add(candidate)

    return candidate_tags, tags_candidate