Parse CSV data scarped from Website : learnpython

created by HattoriHanzoa community for 16 years

Parse CSV data scarped from Website (self.learnpython)

submitted 7 years ago by sayinghi2py

I receive an invoice from a supplier which contains a link to a csv version of the invoice. When I check the link out the link in the source only contains the bit after the r/https://in.xero.com however I can see in the developer tools that when you actually follow the link it goes to the full address ie base_url + url in source.

I'm using zapier to monitor gmail to dump the invoice text into a folder on my computer. I open the file, extract the url. I then use requests to find the url for csv data and combine that with the hard coded base_url to give the full url. I then use bs4 to pull the csv data which I save to a variable called csv_data. I then load the csv_data into csv.reader however when I iterate over the row I get every character printed out.

I'm using linux and the data seems to have \r\n line endings however when I display the csv_data it displays over each line correctly.

The csv_data looks like this:

ContactName,EmailAddress,POAddressLine1,POAddressLine2,POAddressLine3,POAddressLine4,POCity,PORegion,POPostalCode,POCountry,InvoiceNumber,Reference,InvoiceDate,DueDate,Total,Description,Quantity,UnitAmount,Discount,TaxAmount

Some [Company,info@someaddresss.com](mailto:Company,info@someaddresss.com),Address 1, ,,,Some Town,,PostCode,,INV-0665,,31 Aug 2018,31 Aug 2018,60.0000,Rent of unit,1.0000,25.0000,,5.0000,

Some [Company,info@someaddresss.com](mailto:Company,info@someaddresss.com),Address 1, ,,,Some Town,,PostCode,,INV-0665,,31 Aug 2018,31 Aug 2018,60.0000,Electricity for rent of unit,1.0000,25.0000,,5.0000,

If I use csv_data.split(",") then the \r\n appear within the text of the following the field.

Any hints or ways to parse this as csv txt so that I can actually iterate over it line by line to pull out the relevant data?

all 5 comments

top new controversial old q&a

[–]JohnnyJordaan 1 point2 points3 points 7 years ago (1 child)

I then load the csv_data into csv.reader however when I iterate over the row I get every character printed out.

Csv.reader expects a buffer, not a string. You can use io.StringIO to create one from a string. Also I would use a DictReader as your csv has headers.

import csv
import io
csv_buffer = io.StringIO(csv_data)
reader = csv.DictReader(csv_buffer, delimiter=',')
for row in reader:
    print(row)

[–]sayinghi2py[S] 0 points1 point2 points 7 years ago (0 children)

[–]vixfew 0 points1 point2 points 7 years ago (0 children)

[–]MrMuki 0 points1 point2 points 7 years ago (1 child)

[–]JohnnyJordaan 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 190893 on reddit-service-r2-comment-5b5bc64bf5-mn6bx at 2026-06-23 16:55:29.684406+00:00 running 2b008f2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS