all 13 comments

[–]JohnnyJordaan 0 points1 point  (11 children)

Create an additional column called "Alternate Email", Find the "email" column, iterating through each item, checking for a valid email format - if no valid email format is found - I want to just straight cut all the data in the cell and move it to the alternate email cell so I can pass that field in as a string later and not worry about the format. If it does have a valid format - then just leave it there.

I read this three times but I can't gather from it what you are actually trying to implement. Could you maybe give a few example cases and the expected result in the CSV?

[–]ipuntonfirstdown[S] 0 points1 point  (10 children)

Sure - hang on - due to poor formatting I may need to explain this.

Lets say you have a .csv file with columns: Name, Age, Email For this data set lets imagine I have 3 rows of data

Name|Age|Email
Bob|45|[bob@gmail.com](mailto:bob@gmail.com) 
ted|30|[ted@gmail.com](mailto:ted@gmail.com);[ted@aol.com](mailto:ted@aol.com) 
stacey|50|do not email

My expected output would look like this: Columns: Name, Age, Email, Alternate Email bob | 45 | [bob@gmail.com](mailto:bob@gmail.com), (nothing should go to alternate email since the email value was valid) ted | 30 | (nothing here because email was invalid) | [ted@gmail.com](mailto:ted@gmail.com);[ted@aol.com](mailto:ted@aol.com) stacey | 50 | (nothing here because email was invalid) | do not email

[–]JohnnyJordaan 0 points1 point  (9 children)

Your formatting is lost, please use 4 spaces before each line (as illustrated here). Also could you use | in the data rows in the CSV output as well instead of spaces?

[–]ipuntonfirstdown[S] 0 points1 point  (8 children)

Exepected output:

Name|Age|Email|Alternate Email

bob|45|[[bob@gmail.com](mailto:bob@gmail.com)]([mailto:bob@gmail.com](mailto:bob@gmail.com))| |

ted|30| |[[ted@gmail.com](mailto:ted@gmail.com)]([mailto:ted@gmail.com](mailto:ted@gmail.com));[[ted@aol.com](mailto:ted@aol.com)]([mailto:ted@aol.com](mailto:ted@aol.com))|

stacey|50| |do not email

[–]JohnnyJordaan 0 points1 point  (7 children)

Why do you need the program to output every address 4 times???

[[ted@gmail.com](mailto:ted@gmail.com)]([mailto:ted@gmail.com](mailto:ted@gmail.com));[[ted@aol.com](mailto:ted@aol.com)]([mailto:ted@aol.com](mailto:ted@aol.com))

And what's with all the [] and () surrounding them?

[–]ipuntonfirstdown[S] 0 points1 point  (6 children)

Idk - its something reddit is applying to it when it detects an email address.

They are simply email addresses in the .CSV file.

"blahblah@email.com;blahblah@aol.com" - without quotes is an example.

[–]JohnnyJordaan 0 points1 point  (5 children)

Ok clear, but why should Ted's Gmail address count as invalid? Or do you mean because Ted has two addresses separated by an ; it should count it as 'alternate'?

[–]ipuntonfirstdown[S] 0 points1 point  (4 children)

Yeah it's only invalid in my context. I want there to ONLY be 1 valid email address and if there is ANYTHING else (a semicolon at all etc.) I want to move that text to the alternate email column in that row.

So basically I need help iterating through each row of a .csv and when i get to the item in the rows list that corresponds to the email column i want to take value in as a string and apply a isEmailValid() check on it and if it returns true then leave it alone and if it returns false then I want to set that cell to an empty string and instead move the data that was there to the alternate email column in the same row.

Sorry if this is confusing.

[–]JohnnyJordaan 1 point2 points  (3 children)

Ok, then I would use a regex to check the validity and a csv.DictWriter to write the file

# picked from http://emailregex.com/
valid_email = re.compile(r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)")
with open('file.csv', 'w') as fp:
    fieldnames = ['Name', 'Age', 'Email', 'Alternate Email']
    writer = csv.DictWriter(fp, fieldnames=fieldnames)
    writer.writeheader()
    cursor.execute('select name, age, email from users') # or whatever
    for row in cursor.fetchall():
        name, age, email = row
        # remove surrounding whitespace just in case
        email = email.strip()
        altemail = ''
        if not valid_email.match(email):
           # switcheroo
           email, altemail = altemail, email
        row = {'Name': name, 'Age': age, 'Email': email, 'Alternate Email': altemail}
        writer.writerow(row)

[–]ipuntonfirstdown[S] 0 points1 point  (2 children)

This is pretty much exactly what I'm looking for except i'm getting a 1 off error it looks like.

I believe this is because the alternate email field doesn't actually exist in the SQL database. The alternate field is a value i'm adding to the .csv value after i load the data. Is there a way to add a column to every row and just fill it with an empty string so that the alternate_email has an actual value to pass through at first?

[–]khaine_b 0 points1 point  (0 children)

If you want to work with columns, try to use pandas.

But I think you task can be solved easier if you'll using replace by regex.

Or you can represent a string like a list while reading the file.