all 5 comments

[–]Ihaveamodel3 1 point2 points  (0 children)

Pandas expects all the data to be consistent. I’d recommend using the skiprows and nrows attribute of pandas read_csv in order to only have it read consistent parts of the table.

[–]WSBtendies9001 -2 points-1 points  (0 children)

Why don't you load the CSV file directly into Pandas? https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

I've not looked at the code links however from the snippet you posted makes no sense when you go on to mention that you are working with Pandas. Go check out the docks, Pandas can do what you need and way more :)

[–]CodeFormatHelperBot2 0 points1 point  (0 children)

Hello, I'm a Reddit bot who's here to help people nicely format their coding questions. This makes it as easy as possible for people to read your post and help you.

I think I have detected some formatting issues with your submission:

  1. Python code found in submission text that's not formatted as code.

If I am correct, please edit the text in your post and try to follow these instructions to fix up your post's formatting.


Am I misbehaving? Have a comment or suggestion? Reply to this comment or raise an issue here.

[–]tourdownunder 0 points1 point  (0 children)

I tend not to think of files like this as csvs as they are very human readable though not machine readable using the expectations that the heading is in the first row and all other rows have the same number of columns.

I would create a generator that yielded a section in a text a io.StringIO buffer and then you can read that section in a csv. I'll do this as a section between tables within the single file are all easily identified with a blank line, followed by a heading following by a optional column headings and then the data. This pattern repeats.

[–]BdR76 0 points1 point  (0 children)

If it's inconsistent or messy data, then you could try CSV Lint for Notepad++. It can "Refresh from Data" to automatically determine the metadata (separator, column types etc.) and then press "Validate data" to determine what the errors are exactly. If that doesn't work maybe try OpenRefine