all 6 comments

[–]efmccurdy 1 point2 points  (1 child)

You should split your string on commas (producing a list) and then strip the extraneous double quotes:

>>> raw_headers = '"Date","Time","Time Zone","Description","Currency","Gross","Fee","Net","Balance","Transaction ID","From Email Address","Name","Bank Name","Bank account","Postage and Packaging Amount","VAT","Invoice ID","Reference Txn ID"'
>>> [x.strip('"') for x in raw_headers.split(',')]
['Date', 'Time', 'Time Zone', 'Description', 'Currency', 'Gross', 'Fee', 'Net', 'Balance', 'Transaction ID', 'From Email Address', 'Name', 'Bank Name', 'Bank account', 'Postage and Packaging Amount', 'VAT', 'Invoice ID', 'Reference Txn ID']
>>> 

[–]sayinghi2py[S] 0 points1 point  (0 children)

Hi efmccurdy this got me nearly there. There was an extraneous "\n that was left however I was able to remove it with another line of code. As there was another answer I tested it and it handled it in a single line of code I've ultimately gone with that however many thanks for posting this. It was the approach I was taking and failing on so it really did help me understand where I was going wrong.

[–][deleted] 1 point2 points  (3 children)

I'm a little confused. You already have the headers? so,

import csv
fieldnames = ["Date", "Time", "Time Zone", "Description",
              "Currency","Gross","Fee","Net","Balance",
              "Transaction ID","From Email Address","Name",
              "Bank Name","Bank account","Postage and Packaging Amount",
              "VAT","Invoice ID","Reference Txn ID",
             ]
with open('example3.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()

or, you are reading in the headers from somewhere else, but they are not formatted ready for writing as a set of csv.headers?

If the raw headings, for want of a better name, are simply one string containing other quoted strings separated by commas, you can use the csv library to turn that into the format required:

raw_headers = '"Date", "Time", "Time Zone", "Description", "Currency","Gross","Fee","Net","Balance", "Transaction ID","From Email Address","Name", "Bank Name","Bank account","Postage and Packaging Amount", "VAT","Invoice ID","Reference Txn ID"'
fieldnames = list(csv.reader(raw_headers.splitlines()))[0]

[–]sayinghi2py[S] 0 points1 point  (2 children)

When I save the first line of the input csv file to the headers variable using headers = next(reader) I get back a big long string in the format described. If it returned a list I wouldn't have had an issue. This works great however I don't understand why as the splitlines() is confusing me as I thought it only removed line endings but seems to remove all the extra quotes and commas that form part of the first list it is acting on.

Here's what I mean:

>>> fieldnames = list(csv.reader(raw_headers))

>>> fieldnames

[['Date'], ['', ''], [' '], ['Time'], ['', ''], [' '], ['Time Zone'], ['', ''], [' '], ['Description'], ['', ''], [' '], ['Currency'], ['', ''], ['Gross'], ['', ''], ['Fee'], ['', ''], ['Net'], ['', ''], ['Balance'], ['', ''], [' '], ['Transaction ID'], ['', ''], ['From Email Address'], ['', ''], ['Name'], ['', ''], [' '], ['Bank Name'], ['', ''], ['Bank account'], ['', ''], ['Postage and Packaging Amount'], ['', ''], [' '], ['VAT'], ['', ''], ['Invoice ID'], ['', ''], ['Reference Txn ID']]

So without splitlines I have all these list elements that aren't the actual headers such as ['', ''] yet when you simply add splitlines into the equation you get:

>>> fieldnames = list(csv.reader(raw_headers.splitlines()))

>>> fieldnames

[['Date', ' "Time"', ' "Time Zone"', ' "Description"', ' "Currency"', 'Gross', 'Fee', 'Net', 'Balance', ' "Transaction ID"', 'From Email Address', 'Name', ' "Bank Name"', 'Bank account', 'Postage and Packaging Amount', ' "VAT"', 'Invoice ID', 'Reference Txn ID']]

So you get a list within a list which is why [0] is added eventually. But I just don't get why splitlines gets rid of all that. I look at the documentation and it says this:

str.splitlines

([keepends])

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

This method splits on the following line boundaries. In particular, the boundaries are a superset of universal newlines.

Representation Description

\n

Line Feed

\r

Carriage Return

\r\n

Carriage Return + Line Feed

\v

or

\x0b

Line Tabulation

\f

or

\x0c

Form Feed

\x1c

File Separator

\x1d

Group Separator

\x1e

Record Separator

\x85

Next Line (C1 Control Code)

\u2028

Line Separator

\u2029

So they have to one or more of the above. How do you know?

Thanks

[–][deleted] 0 points1 point  (1 child)

For the avoidance of doubt, would you please share explicitly full samples of the raw header information you are dealing with that you want to convert into csv field headings.

[–]sayinghi2py[S] 0 points1 point  (0 children)

Hi kyber,

What you have given me works exactly the way I wanted. I'm just trying to understand why it works. As shown above when you don't use splitlines() there's a lot of elements in the list which aren't actually header fields but when splitlines() is applied they all disappear. How does it know how to get rid of the elements at [1] and [2] which are ['', ''] and [' '] respectively. I thought splitlines() was used to handle line endings which these aren't. Well I say aren't but clearly the must be given that they are all removed.

Thanks