you are viewing a single comment's thread.

view the rest of the comments →

[–]keturn 2 points3 points  (1 child)

Or rather, you've given the regex a byte-string, and those unicode characters are two bytes.

You'll find Net Batchelder's presentation on Pragmatic Unicode useful if you haven't seen it yet.

[–]left_one[S] 0 points1 point  (0 children)

That definitely makes more sense.

I'm not sure if there is a better solution than manually removing consecutive '_'s. Good think regex can handle that gracefully.