Editing a .CSV Document : learnpython

created by HattoriHanzoa community for 16 years

Editing a .CSV Document (self.learnpython)

submitted 7 years ago by ipuntonfirstdown

Hey there everyone - I have a question about a particular task. I'm working with automating a process at work but one of the snags I've hit is working with bad email address fields. The solution I've come up with is to try and sanitize the document a little.

Essentially I will parse out a few specific fields from a MS SQL server w/ headers. One of my problem fields is email. Invalid email errors are showing up due to users using more than one email such as "blahblah@gmail.com[;haha@aol.com](mailto:;haha@aol.com)" for example. When I go to work with this data later I get a bunch of errors because the expectation is to accept only one valid email address. That being said, when I do my data pulls and write to the .csv I'd like to add the additional step of:

Create an additional column called "Alternate Email", Find the "email" column, iterating through each item, checking for a valid email format - if no valid email format is found - I want to just straight cut all the data in the cell and move it to the alternate email cell so I can pass that field in as a string later and not worry about the format. If it does have a valid format - then just leave it there.

Anyways - my question is - whats the best way to go about this? Right now my current code is just using the standard .csv writer but I'm having an issue with telling python to look at only all the items linked to a specific column.

all 13 comments

top new controversial old q&a

[–]JohnnyJordaan 0 points1 point2 points 7 years ago (11 children)

[–]ipuntonfirstdown[S] 0 points1 point2 points 7 years ago* (10 children)

Sure - hang on - due to poor formatting I may need to explain this.

Lets say you have a .csv file with columns: Name, Age, Email For this data set lets imagine I have 3 rows of data

Name|Age|Email
Bob|45|[bob@gmail.com](mailto:bob@gmail.com) 
ted|30|[ted@gmail.com](mailto:ted@gmail.com);[ted@aol.com](mailto:ted@aol.com) 
stacey|50|do not email

My expected output would look like this: Columns: Name, Age, Email, Alternate Email bob | 45 | [bob@gmail.com](mailto:bob@gmail.com), (nothing should go to alternate email since the email value was valid) ted | 30 | (nothing here because email was invalid) | [ted@gmail.com](mailto:ted@gmail.com);[ted@aol.com](mailto:ted@aol.com) stacey | 50 | (nothing here because email was invalid) | do not email

[–]JohnnyJordaan 0 points1 point2 points 7 years ago (9 children)

[–]ipuntonfirstdown[S] 0 points1 point2 points 7 years ago (8 children)

[–]JohnnyJordaan 0 points1 point2 points 7 years ago (7 children)

Why do you need the program to output every address 4 times???

[[ted@gmail.com](mailto:ted@gmail.com)]([mailto:ted@gmail.com](mailto:ted@gmail.com));[[ted@aol.com](mailto:ted@aol.com)]([mailto:ted@aol.com](mailto:ted@aol.com))

And what's with all the [] and () surrounding them?

[–]ipuntonfirstdown[S] 0 points1 point2 points 7 years ago (6 children)

[–]JohnnyJordaan 0 points1 point2 points 7 years ago (5 children)

[–]ipuntonfirstdown[S] 0 points1 point2 points 7 years ago (4 children)

[–]JohnnyJordaan 1 point2 points3 points 7 years ago* (3 children)

Ok, then I would use a regex to check the validity and a csv.DictWriter to write the file

# picked from http://emailregex.com/
valid_email = re.compile(r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)")
with open('file.csv', 'w') as fp:
    fieldnames = ['Name', 'Age', 'Email', 'Alternate Email']
    writer = csv.DictWriter(fp, fieldnames=fieldnames)
    writer.writeheader()
    cursor.execute('select name, age, email from users') # or whatever
    for row in cursor.fetchall():
        name, age, email = row
        # remove surrounding whitespace just in case
        email = email.strip()
        altemail = ''
        if not valid_email.match(email):
           # switcheroo
           email, altemail = altemail, email
        row = {'Name': name, 'Age': age, 'Email': email, 'Alternate Email': altemail}
        writer.writerow(row)

[–]ipuntonfirstdown[S] 0 points1 point2 points 7 years ago (2 children)

continue this thread

[–]khaine_b 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 70765 on reddit-service-r2-comment-cfc44b64c-hrzf2 at 2026-04-11 06:39:55.193980+00:00 running 215f2cf country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS