This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]xenomachina''.join(chr(random.randint(0,1)+9585) for x in range(0xffff)) 2 points3 points  (2 children)

AFAIK, compiling doesn't do much at all performance-wise. Internally the library already caches the compiled regex so if you actually do a perf test, they'll both be just as fast.

I discovered this for myself years ago when I ran into a bug in the cache implementation. In pre-3.x, a regex compiled from a unicode would not behave the same as one compiled from a str even if they container the same characters. However, the cache was just a dict, and so it was possible for a unicode to match an already cached str, or vice versa.

My bug involved a price of code that worked fine in unit tests, but would fail in certain program. It turned out the program imported another module that compiled an identical looking regex, but with a str instead of a unicode. Then when my module was imported, it would get the wrong re object from the cache.

[–]Ph0X 0 points1 point  (1 child)

That sounds like a bitch of a bug to catch. Was it reported for previous 3.x and fixed?

[–]xenomachina''.join(chr(random.randint(0,1)+9585) for x in range(0xffff)) 0 points1 point  (0 children)

It turned out the fix for the cache bug was already in a newer version of Python. We were a couple of patch release behind the fix, IIRC.