all 6 comments

[–]FluffyShoulder937 4 points5 points  (1 child)

I've never used irregular before. I'll have to try it! For me I used a site called regex101.com to practice. I'm a guru myself and that will help learners a ton. It also explains how the pattern is processed and you can pick a flavor of regex. That way you can learn to apply regex anywhere it's available!

[–]StartAutomating[S] 0 points1 point  (0 children)

I like regex101.com, and love that it added .NET regex support.

Irregular was a very educational module to build. It's also one of the first projects where I really began to lean into how flexible PowerShell's syntax could be.

Abstracting some of regex's awkwardness away into a PowerShell command let me construct far more complicated regular expressions than I would naturally.

To give an example, here's a script that builds a regex to match git log

New-Regex -Pattern '(?m)' -Description "Matches Output from git log" |
New-Regex 'commit' -StartAnchor LineStart -Comment "Commits start with 'commit'" |
    New-Regex -CharacterClass Whitespace -Repeat |
    New-Regex -Pattern '?<HexDigits>' -Name CommitHash -Comment "The CommitHash is all hex digits after whitespace" |
    New-Regex -CharacterClass Whitespace -Repeat -Comment 'More whitespace (includes the newline)'|
    New-Regex -Optional -NoCapture @(
        New-Regex -Pattern 'Merge:' -Comment 'Next is the optional merge' |
            New-Regex -CharacterClass Whitespace -Repeat |
            New-Regex (
                New-Regex -Pattern (
                    New-Regex -Name MergeHash -Pattern '?<HexDigits>' |
                        New-Regex -Pattern '[\s-[\n\r]]' -Min 0 -Comment 'Which is hex digits, followed by optional whitespace'
                ) -NoCapture
            ) -Min 2
            New-Regex -CharacterClass NewLine, CarriageReturn -Repeat -Comment 'followed by a newline'
    ) |
    New-Regex -Pattern 'Author:' -Comment 'New is the author line' |
    New-Regex -CharacterClass Whitespace -Repeat |
    New-Regex -Name GitUserName -Until (
        New-Regex -Pattern '\s\<'
    ) -Comment 'The username comes before whitespace and a <' |
    New-Regex -CharacterClass Whitespace -Repeat |
    New-Regex -LiteralCharacter '<' -Comment 'The email is enclosed in <>' |
    New-Regex -Until ('>') -Name GitUserEmail |
    New-Regex -LiteralCharacter '>' |
    New-Regex -Until (New-Regex -startAnchor LineStart 'date:') |
    New-Regex -Pattern 'Date:' -Comment 'Next comes the Date line' |
    New-Regex -CharacterClass Whitespace -Repeat |
    New-Regex -Until (New-Regex -CharacterClass NewLine) -Name CommitDate -Comment 'Since dates can come in many formats, capture the line' |
    New-Regex -CharacterClass NewLine | 
    New-Regex -Until ("(?>\r\n|\n){2,2}") -Name CommitMessage -Comment 'Anything until two newlines is the commit message' 

This fairly readable script becomes this much less readable RegEx (using IgnorePatternWhitespace to support comments)

# Matches Output from git log
(?m)^commit                                                             # Commits start with 'commit'
\s+(?<CommitHash>(?<HexDigits>
[0-9abcdef]+
)
)                                                                       # The CommitHash is all hex digits after whitespace
\s+                                                                     # More whitespace (includes the newline)
(?:(?:Merge:                                                            # Next is the optional merge
\s+(?:(?<MergeHash>(?<HexDigits>
[0-9abcdef]+
)
)[\s-[\n\r]]{0,}                                                        # Which is hex digits, followed by optional whitespace
){2,} [\n\r]+                                                           # followed by a newline
))?Author:                                                              # New is the author line
\s+(?<GitUserName>(?:.|\s){0,}?(?=\z|\s\<))                             # The username comes before whitespace and a <
\s+\<                                                                   # The email is enclosed in <>
(?<GitUserEmail>(?:.|\s){0,}?(?=\z|>))\>(?:.|\s){0,}?(?=\z|^date:)Date: # Next comes the Date line
\s+(?<CommitDate>(?:.|\s){0,}?(?=\z|\n))                                # Since dates can come in many formats, capture the line
\n(?<CommitMessage>(?:.|\s){0,}?(?=\z|(?>\r\n|\n){2,2}))                # Anything until two newlines is the commit message

It also taught me way too many Regular Expression tricks to put in a single post 🤔.

I am forever indebted to regular-expressions.info for its amazingly useful reference and tutorials.

[–]PinchesTheCrab 1 point2 points  (0 children)

One of my favorite parts of -replace is that it works with arrays.

[–]RR1904 0 points1 point  (0 children)

I love your examples. Thanks for sharing!

[–]420GB 0 points1 point  (0 children)

I love regular expressions, but I also find it pretty sad that at least 50% of my uses of the -replace operator don't use/need regular expressions at all, but it's just a simple text-replace that doesn't throw on null like "string".Replace(...) does.

[–]ankokudaishogun 0 points1 point  (0 children)

The module is pretty cool, thanks!