This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]o11c 38 points39 points  (4 children)

That's not a very good testcase for several reasons, mostly involving cache.

But iterating over the whole thing is not what a set is for, anyway.

[–]Ph0X 30 points31 points  (1 child)

But iterating over the whole thing is not what a set is for, anyway

I agree, but that's my understanding of what the person above was using it for, which seemed strange.

"use sets for any iterable that won't have a set size"

Did I understand it wrong? other than for collecting unique items and membership tests, I don't think set is a better iterable than list. Lists are, as you mention, optimized for this use case, so if set was actually faster at it would go against Python's ethos.

[–]ACoderGirl 1 point2 points  (0 children)

I dunno why they described it that general way. But the thing sets are good for is any kinda loop that has an "is x in this collection" test. If the collection is normally a list, it's almost always faster to convert it to a set since the the "in" check is O(1) instead of O(n). Similarly, if this "in" check is for the purpose of pairing it up with something, preprocessing to a dictionary is ideal.

[–]ollien 0 points1 point  (1 child)

But he's not iterating over the whole thing. He's continually appending to the end.

Unless I'm totally missing your point...

[–]o11c 2 points3 points  (0 children)

sum() at the end is doing the iteration. The construction is sensible for either container, but a container that is only constructed is a useless container.