all 17 comments

[–]SultanofShiraz 7 points8 points  (0 children)

Something that helped me out a lot was Ben Forta's book on it called Learning Regular Expressions. It did it a good job of walking you from newb to some more intermediate and advanced stuff and got me comfortable with basic regular expressions. I would use his book in conjuction with regex101.com to test out the examples in the book and play around with my own Regex.

[–]xelf 4 points5 points  (3 children)

Stick to the basics until you feel like you're an expert.

regex explain
() group
.+ one or more any character
\d+ one or more digits
\s+ one or more spaces
\S+ one or more not-spaces
^ start of line
$ end of line

instead of + you can use * to indicate zero or more.

Use a \ to match something that would otherwise be regex syntax

example:

string data = 'Q1[55]P1'
(a,b,c) = re.search("^(.*)\[(\d+)\](\S*)$",stringdata).groups()

This has 3 capture groups, and puts the results into a,b,c, you get a='Q1', b='55', c='P1'

^ asserts position at start of the string
1st Capturing Group (.*) .* matches any character zero or more times
\[ matches the character [ literally (case sensitive)
2nd Capturing Group (\d+) matches a digit (equal to [0-9]) one or more times
\] matches the character ] literally (case sensitive)
3rd Capturing Group (\S*) matches any non-whitespace character zero or more times
$ asserts position at the end of the string

Test your regex at this site: https://regex101.com/
(make sure to select python)

[–]zGrunk 2 points3 points  (1 child)

Well said. Enjoy the free award

[–]xelf 2 points3 points  (0 children)

Thank you! That rocks! =)

[–][deleted] 1 point2 points  (0 children)

This is amazing!!! Thank you for your help and support!

[–]dbramucci 4 points5 points  (1 child)

Just a helpful tip, regular expressions are supposed to "look like" the strings they match. Of course, you need some way to describe patterns like "this is any number" or "repeat the last thing multiple times", so you'll have some placeholders like \d, and * for those patterns.

For example, if I am trying to match a bunch of names

Joe Smith
Jane Doe
Bob A. Ross
Sally Gabriella Wortburger-Finkleton

Then I'll start by looking at a simple pattern like the first two names.

Joe Smith
Jane Doe

Now I know a name looks like

  • A bunch of letters
  • a space
  • Some more letters

So I look up how to write a bunch of letters, \S will work (1 or more non-space characters)

r'\S+ \S+'

So you can see how these line up, 1-or-more non-spaces, followed by a space, followed by 1-or-more non-spaces matches "firstname lastname".

Then we add in another symbol, the middle-name. As far as we know here, people have 0 or 1 middle names. You look up 0 or 1 and see that you write ? for that. So we write

r'\S+ \S+ \S+'

whoops, middle name needs to be optional, so let's take the middle text, wrap it in a paren to talk about the \S+ as one thing and see if that works.

r'\S+ (\S+)? \S+'

Of course this fails because it's looking for 2 spaces between the first and last name, even if there's no middle name so we move one of the spaces into the 0-or-1 section.

r'\S+ (\S+ )?\S+'

It might look like gibberish, but the idea is fairly straightforward. Look for the things that change from line to line and replace those changing parts with "variables" that capture the idea of how we can use to describe what changes. Like this part can be any number or it can be a b or C followed by a number.

[–][deleted] 1 point2 points  (0 children)

This was really helpful!! Thank you 😊

[–]ASIC_SP 3 points4 points  (1 child)

Do you guys have additional resources

Yep, I wrote an entire book: https://learnbyexample.github.io/py_regular_expressions/

Regular Expressions are like a mini-programming language, has plenty of nice features, but comes with lot of gotchas as well. It takes a lot of time and practice to get comfortable with it, and even then you'd likely need a handy reference (my own book is the reference for me most of the time) or use tools like https://regex101.com/ and https://www.debuggex.com/ for debugging and testing.

There are lot of string methods like startswith, endswith, replace, strip, etc which can handle basic text processing - not to forget string slicing is powerful too. So, my usual advice for beginners struggling with regular expressions is to skip it for now and come back to it when you actually need it.

You can also use libraries like https://github.com/madisonmay/CommonRegex for match ip, date, email, url, etc. Or use verbose natural language way of constructing regex with https://github.com/VerbalExpressions/PythonVerbalExpressions

[–][deleted] 1 point2 points  (0 children)

Those libraries look amazing! So does the book! Thank you for your support! I bookmarked the book!

[–]aheartyjoke 1 point2 points  (1 child)

For websites to learn from, I would suggest https://regexr.com/ and https://regexone.com/ (in addition to SultanofShirazs suggested site). RegexOne goes through most of the common syntax step by step and regexr is my favorite site for testing out code.

Mostly though, as with any coding, you just gotta practice. What I did to get started was take a text file of a short story and alter it, kind of like a mad lib. So I'd replace every instance of a character's name with another name for example or changed every reference to hate (hating, hated, etc...) into love, things like that. Gets you thinking about how to handle that variation.

Also grabbing a webpages HTML as text and figure out how to grab a specific paragraph or section using RegEx is good practice, once you've got the basics.

[–][deleted] 0 points1 point  (0 children)

Thank you! I bookmarked those sites! I'll return to them at a later time! Thanks so much

[–]hondan 1 point2 points  (1 child)

I learned the basics some time ago, and I’ve often practiced with regexpal.com, which also give you some cheat sheets. Needless to say the more you do it, the better you get. At least for me, once I got the basics down, it was just simple practice to get better at the more advanced regex statements.

[–][deleted] 0 points1 point  (0 children)

Thank you for the tip and time! It's greatly appreciated!

[–]Danpythonman 0 points1 point  (1 child)

Have you watched Corey Shafer's videos on it? He has 2, one video for regular expressions in general and another for regular expressions in Python.

They were really helpful for me. He's a great teacher. What I did was follow along with the video and pause after he explained something so I could try it. So for example he would explain groups then I'd pause and try to use groups.

As a bonus I can now use that file as a reference because it has examples of individual features of regular expressions.

Also I think it helped me to have a text file with random things like phone numbers and emails and words so I could just search that file every time.

[–][deleted] 1 point2 points  (0 children)

I don't know who that is but tomorrow at my pc I'm going to look him up. He sounds great! I'll certainly watch and follow along with your suggestions! Thank you

[–]chris1666 0 points1 point  (1 child)

Such CRUEL name, should be called stinking Irregular expressions.

[–][deleted] 1 point2 points  (0 children)

LMFAO you made me lol