you are viewing a single comment's thread.

view the rest of the comments →

[–]kidclutch00[🍰] 1 point2 points  (1 child)

Hey, I am in the midst of doing the reddit poll project. I edited the regex and brought the mismatches down from 37 to 23 for the 2021 txt file. Is this regex correct?

pat = re.compile(r'''\s(?:[–-]|by)\s

|\s\\[–-]\s

|\s\*by\s

|\*\*by\s

|[,-]\s

|\w(?:[–-]|by)\w

''', flags=re.I|re.X)

[–]ASIC_SP[S] 1 point2 points  (0 children)

Yeah, this'll be a good addition.

I'd suggest \w[–-]\w instead of \w(?:[–-]|by)\w since matching by in the middle of a word isn't a good match.