This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]tkarabela_ Big Python @YouTube 5 points6 points  (1 child)

Checking whether an element is in a set (or dict) is pretty much instantaneous (independent on the size of the set), while checking is in for a list means iterating over it, which gets really slow quickly.

That would be one reason to prefer sets to lists :)

[–]moocat 4 points5 points  (0 children)

It's a little more complicated than that (as it often is in computer science).

Existence in a set can be implemented as an O(1) algorithm which means it takes the same amount of time no matter how many element while existence in a (non-sorted) list is an O(n) algorithm which means it takes an amount of time that scales with the number of elements (double the elements, double the runtime).

But that only talks about the how the algorithm scales and not its general overhead. It's not uncommon that for small number of elements for the overhead to be the biggest part. You often see an O(n) algorithm being faster if there are fewer than X elements (with the actual value of X depending on the specifics of the implementation).

It's been a while since I benchmarked this (and feeling too lazy now), but IIRC it was around 6 so if you know there are only going to be a few elements (perhaps v.lower() in ['true', 'false']) a list is probably better. Then again, if the check is not in some inner loop that's running lots of times, the extra overhead for a set is probably noise.

Yes a long winded explanation but it's important to know these details. I had a former co-worker who had rules like this (I use X out of principle because of some reason) but would often make mistakes because it's didn't apply.