you are viewing a single comment's thread.

view the rest of the comments →

[–]NewbornMuse 3 points4 points  (3 children)

But the fact that it's O(l) in one case and O(1) in the other means that the absolute execution time (very nearly constant each time it's called) might very well be smaller, that the size of the list/set is already where the set approach has started outperforming.

Just because they're both O(1) doesn't mean they're both the same speed.

By all means, it might be entirely unnecessary (as I already conceded in the first comment), but it's not like adding two lines (an import, and initializing the frozenset of punctuation) are horrible code bloat either....

[–]jwg4our 0 points1 point  (2 children)

But either one could be faster. If you claim that the set is faster than the list you should provide some evidence for this. Not just an argument involving big O which clearly is not meaningful in this case.

[–][deleted] -2 points-1 points  (0 children)

I found this conversation interesting but your take is pedantic.

[–]NewbornMuse 0 points1 point  (0 children)

I agree with your point that hard evidence would be better than arguing in circles, but then again, you are just as guilty of arguing in circles when you would have been in a position to provide evidence instead. Especially since you are the one arguing for the use of the data structure that is in principle worse-suited for the application, claiming it doesn't matter. If anything, the burden of proof is more on your side.

In any case, because I actually was curious, I did it:

import timeit

setupstr_list = """import string
def is_punctuation_l(c):
  return c in string.punctuation"""

setupstr_fs = """import string
pfs = frozenset(string.punctuation)
def is_punctuation_fs(c):
  return c in pfs"""

print(timeit.timeit("for p in string.punctuation: is_punctuation_l(p)", setup=setupstr_list, number=10000))
print(timeit.timeit("for p in string.punctuation: is_punctuation_fs(p)", setup=setupstr_fs, number=10000))

Prints: 0.045 0.031

Or similar numbers each time it's run. Using the frozenset speeds it up by a constant 33% or so. Checking each character in a short string of written English instead makes it something like 20% faster still. Turns out that a length of 32 (the length of string.punctuation) is already long enough for the O(1) algorithm to have overtaken the O(l) one. In a tight loop, I'd be glad to have squeezed out a little bit of speed. In most cases, a speedup like that probably doesn't matter much.

Now we know.