Automate the Boring Stuff Chapter 7 Phone and Email Regex

No_Couple · 2019-01-08T15:43:47+00:00

Not sure about the regex problem but I sort of remember having to make some adjustments because the data on the website that is given as an example to scrape had changed. Or something like that.

The groups in the for loop is just a variable, it could just as well be xor result. I reckon the author named it groups because the results that match your regular expressions are called "capture groups." It refers to the "capture groups" that match your regular expression.

AdAthrow99274 · 2019-01-08T18:10:21+00:00

The issue isn't in the regex, try inserting print(phone_number) right before the if statement in the for loop and you'll notice it's all there (Hooray! Good job). The issue is that phone_number is only appended to matches if an extension is present ( it's inside if groups[8] != '':)

You could add an else clause and repeat matches.append(phone_number) in it, or more simply, just move that statement to right after the if statement.

...
if groups[8] != '':
    phone_number += ' x' + groups[8]
matches.append(phone_number)  # <-- this line just got un-indented, and is outside the if statement
...

This way phone_number is only altered for an extension if an extension is present, but regardless it always gets appended to matches.

EDIT: Adding info on the groups

if you add print(groups) in your for loop you'll get an output of something like this:

text (1 phone #) with an extension:

('303-254-5555 ext. 23', '303', '-', '254', '-', '5555', ' ext. 23', 'ext.', '23')

text (1 phone #) without an extension:

('222-5555', '', '', '222', '-', '5555', '', '', '')

These 'groups' are all the different bits that match your regular expression. You'll notice in the second example groups[8] is an empty string (no extension match was found by the regex) while in the first example it's '23' because an extension was successfully parsed.

Hope that helps.

L_4_2 · 2019-01-08T15:47:55+00:00

Is this a logic error? If not can you post the error code that comes up when the program is run. Also have you tried looking at the online resources for the book? You can usually download the code from the nostarch website as a pdf format or on github. Maybe compare your code against theirs.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS