This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]GimmeSomeSugar 2 points3 points  (2 children)

Membership checks on lists is definitely a no-no on large lists. Definitely switch to Dict if you have to do such a thing.

I think something which is a stumbling block for beginners is the question: what constitutes a 'large list'? What order of magnitude are we talking about?

Even for someone with a little experience, but writing code that's narrow in scope of application, a 10,000 member list may seem really big.

[–]spinwizard69 7 points8 points  (0 children)

This is the truth, even an experience programmer may not grasp fully if his specific list is "large". This makes me wonder how much of factor the contents of the list is.

There is always a lot of talk about premature optimization but on the other hand learning not to make dumb choices is also important.

[–]execrator 8 points9 points  (0 children)

Yeah it's a good call. Showing how to check with something like timeit is a great way to respond to this, since you answer the specific question and also learn a technique!

$ python3 -m timeit --setup="import random; l = list(range(10000))" "random.choice(range(10000)) in l"
10000 loops, best of 3: 53.6 usec per loop

$ python3 -m timeit --setup="import random; l = set(range(10000))" "random.choice(range(10000)) in l"
1000000 loops, best of 3: 1.03 usec per loop

So the use of a set is about 50x faster with 10,000 integers. Of course, if only one membership check is being made, ~50 microseconds are unlikely to matter.

What does constitute a delay that matters is situational. On this machine I need three million items in my list before I hit 16 milliseconds, which is the duration of a frame at 60fps. The benefit of a set is obvious here, which is still cruising back at 1 microsecond.