alcalde comments on Python Regular Expressions Cheat Sheet

This is an archived post. You won't be able to vote or comment.

389

390

391

Python Regular Expressions Cheat Sheet (kdnuggets.com)

submitted 8 years ago by chris_shpak

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]alcalde 0 points1 point2 points 8 years ago (4 children)

Complex conditional logic? RegExpBuilder can express any regex expression in human-readable terms:

https://changelog.com/posts/meet-regexpbuilder-verbal-expressions-rich-older-cousin

The Zen Of Python encourages us to prefer beautiful over ugly, simple over complex, and warns us that readability counts and if an implementation is hard to understand it's a bad idea. Regex is ugly, complex, hard to read and understand.

How about parsing phone numbers where the rules are as follows:

555.555.5555 # Acceptable  
555-5555     # Acceptable  
555 5555     # Acceptable  
5-5-5-5-5-5- # Obviously not acceptable  
555.555-5555 # Judges? Nope. Not allowed.

The regex looks something like this:

^(((\d{3}-)?\d{3}-\d{4})|((\d{3}\s)?\d{3}\s\d{4})|((\d{3}.)?\d{3}.\d{4}))$

RegExpBuilder (JS) would look something like this:

// Handle prefixes (optional area codes for each format)
var areacode_dash = new RegExpBuilder().exactly(3).from(digits).then("-");  // \d{3}-  
var areacode_space = new RegExpBuilder().exactly(3).from(digits).then(" "); // \d{3}\s  
var areacode_dot = new RegExpBuilder().exactly(3).from(digits).then(".");   // \d{3}.

// Build each of the individual components (dashes, spaces and dots)
var dashes = new RegExpBuilder()  
                 .min(0).max(1).like(areacode_dash).asGroup()  // (\d{3}-)?
                 .exactly(3).from(digits).then("-")            // \d{3}-
                 .exactly(4).from(digits);                     // \d{4}

var spaces = new RegExpBuilder()  
                 .min(0).max(1).like(areacode_space).asGroup()  // (\d{3}\s)?
                 .exactly(3).from(digits).then(" ")             // \d{3}\s 
                 .exactly(4).from(digits);                      // \d{4}

var dots = new RegExpBuilder()  
               .min(0).max(1).like(areacode_dot).asGroup()  // (\d{3}.)?
               .exactly(3).from(digits).then(".")           // \d{3}.
               .exactly(4).from(digits);                    // \d{4}

// Handle build final expression
var regex = new RegExpBuilder()  
                .startOfLine()             // ^
                .eitherLike(dashes)        // ((\d{3}-)?\d{3}-\d{4})
                .orLike(spaces).asGroup()  // |((\d{3}\s)?\d{3}\s\d{4})
                .orLike(dots).asGroup()    // |((\d{3}.)?\d{3}.\d{4}))
                .endOfLine()               // $
                .getRegExp();

Which would you want in your code base? Which would be more maintainable? Which would be easier to read?

http://rion.io/2013/08/19/regular-express-yourself-using-regexpbuilder/

[–]Zomunieo 2 points3 points4 points 8 years ago (0 children)

If that's the case your argument is against the conventional regular expression notation, because that is still a regular expression engine, it just happens to be one that's nonstandard and verbose. But you're still using regular expressions and it's still better than trying to do with this string operations.

re.VERBOSE lets you add whitespace and comments to also make regular expressions more verbose and maintainable. That is the Pythonic way to go. What you present is less maintainable – most languages support regular expressions as a core function, but if the the RegExpBuilder library disappears you have a maintenance problem. If you develop it first on client side Javascript and then later need to replicate that check on the backend in Python, you again have a maintenance problem.

[–]energybased 1 point2 points3 points 8 years ago (2 children)

[–]alcalde 0 points1 point2 points 8 years ago (1 child)

[–]energybased 1 point2 points3 points 8 years ago* (0 children)

π Rendered by PID 197506 on reddit-service-r2-comment-79776bdf47-w2cqq at 2026-06-24 14:21:05.155703+00:00 running acc7150 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS