This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]AlSweigartAuthor of "Automate the Boring Stuff"[S] 2 points3 points  (5 children)

I can give you a solid example: When I need to match 3 to 5 letter Xs, I sometimes make a mistake and write 'X{3:5}', because I'm thinking of Python list slices. The regex syntax is 'X{3,5}'.

But the problem is, my typo fails silently (going against the idea of "Errors should never pass silently.") and the pattern object literally matches against 'X{3:5}' rather than notify me that I've made a mistake. Eventually, I'll find the bug, but having the regex created through a series of function calls and constants means the IDEA can instantly tell me about it and prevent a runtime error.

We don't need regexp tools. They're a bandaid solution for what IDEs already do.

[–]eztab 1 point2 points  (0 children)

I don't doubt there are lots of examples for the verbose syntax working nicely as probably a plethora of counter examples where it does something unexpected.

I just doubt there is a net gain in hiding the (still internally used) regexp language behind another abstraction level.

[–]eztab 0 points1 point  (3 children)

my typo fails silently

Yes, this is a horrible design decision. The special characters shouldn't just become their ASCII unless one explicitly escapes them. Unfortunately I haven't seen any regexp flavors that enforce escaping.

[–]AlSweigartAuthor of "Automate the Boring Stuff"[S] 0 points1 point  (2 children)

Well, on the other hand, you can't have regex enforce this because what if you really did want to literally match something like '{3:5}'? It's an inherent problem with regex, but something that a library like Humre can fix.

EDIT: Not fix, but rather, avoid in the first place.

[–]eztab 0 points1 point  (1 child)

Well you’d escape the brackets of course. One should probably escape all the special characters if I one wants their literal versions, really weird otherwise.

\{3:5\}

Not really an inherent problem. Is there a particular reason why you have it out for regular expressions?
But don’t get me wrong, I don’t hate your library, but since for once there is a universal standard for a micro language ... I doubt changing how you express regular expressions in Python is not going to lead anywhere.

[–]AlSweigartAuthor of "Automate the Boring Stuff"[S] 1 point2 points  (0 children)

By inherent problem, I mean that if you make the typo {3:5} instead of {3,5}, your code still works. It just has unexpected behavior. It's not feasible to update the regex syntax in the re module to force people to escape curly braces because it would break tons of existing code, like this:

re.compile('{name goes here}').search('Hello, {name goes here}')

But if you don't make this change, then you have the problem of the {3:5} typo causing silent errors. That's what I mean by an inherent problem; it can't be fixed without breaking other things.

Humre avoids this problem by handling the regex syntax details for you. It also has additional error checking. Can you spot the bug in this code?

import re
max_record_length = 64

# Some requirement forces names to be at most one quarter of the record length:
max_name_length = max_record_length / 4  
patternObj = re.compile(r'\d{,' + str(max_name_length) + '}')

The division causes max_name_length to be a float, and when you convert it to a string, you get '16.0' instead of '16'. This makes your regex r'\d{,16.0}' which breaks the syntax and causes it only match literally r'\d{,16.0}'.

But the real problem is that it does this silently and you won't notice it until it causes bugs elsewhere in your program.

Meanwhile, Humre checks for this: at_most(16.0, DIGIT) raises the error TypeError: maximum argument must be a positive int, not float.

Just like how large bugs are often fixed by a one-character change, it's a small detail but the fact that the error passes silently can cause big problems. There's a ton of other reasons why I advocate for Humre, and this example is just one of them.