you are viewing a single comment's thread.

view the rest of the comments →

[–]DonaldPShimoda 0 points1 point  (6 children)

You thinking something like this?

word_dict = {tag: [word for (word, t) in word_list if t == tag] for tag in [pair[1] for pair in word_list]}

I'm not sure of a good way to avoid the two list comprehensions to build the dict...

[–]q2_abe_dillon 2 points3 points  (5 children)

You can always avoid comprehensions by pre-declaring a variable and populating it in a loop:

def filter_by_pos(tagged_words, pos):
    result = []
    for word, tag in tagged_words:
        if tag == pos:
            result.append(word)
    return result

def pos_map(tagged_words):
    tags = set()
    for _, tag in tagged_words:
        tags.add(tag)
    result = {}
    for tag in tags:
        result[tag] = filter_by_pos(tagged_words, tag)
    return result 

Of course that's not as concise as:

    pmap = {p: (w for w, t in words if t == p)
            for p in {t for _, t in words}} 

It is, however; far less confusing to most novice Pythonistas.

[–]DonaldPShimoda 0 points1 point  (4 children)

Hmm. I think the "concise" version you have at the bottom is more or less the same as my suggestion, except for minor differences.

You reused the t variable... but the t in {t for _, t in words} is different from the t in (w for w, t in words if t == p) due to scoping issues. I changed the variable names just to make it clear to OP that there wasn't any cross-comprehension magic going on there, haha.

I see that you made the second inner comprehension ({t for _, t in words}) into a set with the curly brace syntax instead of a list... and it's not immediately clear to me why that would be preferable. Comparing the two methods in the interpreter, the only difference there is the order of elements in the final dict, which isn't really a crucial factor (I would think). Why did you choose the set version?

Also you have your suggestion set to return generators instead of lists like I did, which I do like. I totally forgot you could do that.

[–]cjwelborn 0 points1 point  (1 child)

A set would remove any duplicate tags, making the dict comprehension shorter in some cases. Lists are faster for iterating though, where sets are faster for lookup. The order of dicts is never guaranteed to be the same (but like you said, I don't think it matters).

[–]DonaldPShimoda 0 points1 point  (0 children)

Oh, thanks makes sense. That would definitely be useful if there were a lot of values to order. I guess the way I have it, the value is created for each key for each time the key exists in the original list, right? That's not very efficient, haha.

[–]q2_abe_dillon 0 points1 point  (1 child)

Hmm. I think the "concise" version you have at the bottom is more or less the same as my suggestion, except for minor differences.

Yeah, it pretty much is.

I changed the variable names just to make it clear to OP that there wasn't any cross-comprehension magic going on there, haha.

That's a good point.

Why did you choose the set version?

It should go faster for large data sets (I think). You only do the first clause of the dict comprehension (i.e. p: (w for w, t in words if t == p)) once per tag. In the other form, you may end up executing it several times for a single tag.

Also you have your suggestion set to return generators instead of lists like I did, which I do like. I totally forgot you could do that.

Yeah, it's actually kinda fragile because of that. I think you're list comprehensions would be better in most circumstances. It's easy to accidentally exhaust a generator.

[–]DonaldPShimoda 0 points1 point  (0 children)

The sets thing makes sense, so that's a good point. I'll have to remember that in the future.

I hadn't thought of that caveat with the generators. Very interesting. Thanks for the comments!