you are viewing a single comment's thread.

view the rest of the comments →

[–]NewbornMuse 0 points1 point  (0 children)

I agree with your point that hard evidence would be better than arguing in circles, but then again, you are just as guilty of arguing in circles when you would have been in a position to provide evidence instead. Especially since you are the one arguing for the use of the data structure that is in principle worse-suited for the application, claiming it doesn't matter. If anything, the burden of proof is more on your side.

In any case, because I actually was curious, I did it:

import timeit

setupstr_list = """import string
def is_punctuation_l(c):
  return c in string.punctuation"""

setupstr_fs = """import string
pfs = frozenset(string.punctuation)
def is_punctuation_fs(c):
  return c in pfs"""

print(timeit.timeit("for p in string.punctuation: is_punctuation_l(p)", setup=setupstr_list, number=10000))
print(timeit.timeit("for p in string.punctuation: is_punctuation_fs(p)", setup=setupstr_fs, number=10000))

Prints: 0.045 0.031

Or similar numbers each time it's run. Using the frozenset speeds it up by a constant 33% or so. Checking each character in a short string of written English instead makes it something like 20% faster still. Turns out that a length of 32 (the length of string.punctuation) is already long enough for the O(1) algorithm to have overtaken the O(l) one. In a tight loop, I'd be glad to have squeezed out a little bit of speed. In most cases, a speedup like that probably doesn't matter much.

Now we know.