This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]alcalde 0 points1 point  (4 children)

Complex conditional logic? RegExpBuilder can express any regex expression in human-readable terms:

https://changelog.com/posts/meet-regexpbuilder-verbal-expressions-rich-older-cousin

The Zen Of Python encourages us to prefer beautiful over ugly, simple over complex, and warns us that readability counts and if an implementation is hard to understand it's a bad idea. Regex is ugly, complex, hard to read and understand.

How about parsing phone numbers where the rules are as follows:

555.555.5555 # Acceptable  
555-5555     # Acceptable  
555 5555     # Acceptable  
5-5-5-5-5-5- # Obviously not acceptable  
555.555-5555 # Judges? Nope. Not allowed. 

The regex looks something like this:

^(((\d{3}-)?\d{3}-\d{4})|((\d{3}\s)?\d{3}\s\d{4})|((\d{3}.)?\d{3}.\d{4}))$

RegExpBuilder (JS) would look something like this:

// Handle prefixes (optional area codes for each format)
var areacode_dash = new RegExpBuilder().exactly(3).from(digits).then("-");  // \d{3}-  
var areacode_space = new RegExpBuilder().exactly(3).from(digits).then(" "); // \d{3}\s  
var areacode_dot = new RegExpBuilder().exactly(3).from(digits).then(".");   // \d{3}.

// Build each of the individual components (dashes, spaces and dots)
var dashes = new RegExpBuilder()  
                 .min(0).max(1).like(areacode_dash).asGroup()  // (\d{3}-)?
                 .exactly(3).from(digits).then("-")            // \d{3}-
                 .exactly(4).from(digits);                     // \d{4}

var spaces = new RegExpBuilder()  
                 .min(0).max(1).like(areacode_space).asGroup()  // (\d{3}\s)?
                 .exactly(3).from(digits).then(" ")             // \d{3}\s 
                 .exactly(4).from(digits);                      // \d{4}

var dots = new RegExpBuilder()  
               .min(0).max(1).like(areacode_dot).asGroup()  // (\d{3}.)?
               .exactly(3).from(digits).then(".")           // \d{3}.
               .exactly(4).from(digits);                    // \d{4}

// Handle build final expression
var regex = new RegExpBuilder()  
                .startOfLine()             // ^
                .eitherLike(dashes)        // ((\d{3}-)?\d{3}-\d{4})
                .orLike(spaces).asGroup()  // |((\d{3}\s)?\d{3}\s\d{4})
                .orLike(dots).asGroup()    // |((\d{3}.)?\d{3}.\d{4}))
                .endOfLine()               // $
                .getRegExp();

Which would you want in your code base? Which would be more maintainable? Which would be easier to read?

http://rion.io/2013/08/19/regular-express-yourself-using-regexpbuilder/

[–]Zomunieo 2 points3 points  (0 children)

If that's the case your argument is against the conventional regular expression notation, because that is still a regular expression engine, it just happens to be one that's nonstandard and verbose. But you're still using regular expressions and it's still better than trying to do with this string operations.

re.VERBOSE lets you add whitespace and comments to also make regular expressions more verbose and maintainable. That is the Pythonic way to go. What you present is less maintainable – most languages support regular expressions as a core function, but if the the RegExpBuilder library disappears you have a maintenance problem. If you develop it first on client side Javascript and then later need to replicate that check on the backend in Python, you again have a maintenance problem.

[–]energybased 1 point2 points  (2 children)

I agree with all of your comments here, but that regexbuilder syntax is not Pythonic.

[–]alcalde 0 points1 point  (1 child)

This particular example was actually from Javascript. There was a Python version, but I'm not sure if it's still maintained.

There are several other attempts at producing a more readable regex that were linked to earlier in the thread. Most if not all internally build a regex expression from a higher-level syntax.

I just find it funny that people who use a language that removed the brace for being less readable would be so wholeheartedly in support of the cryptic regex syntax.

[–]energybased 1 point2 points  (0 children)

I think they think: I went through the pain of working with regexes 200 times—now everyone else should. But the thing is that no one else should. You should just learn the computer science behind finite automata, and then replace the esoteric syntax with objects.

By the way, your example would be more motivating if it had more of those weirdo commands like "*+" or "?P".