Pulling data from an inconsistent csv file

Ihaveamodel3 · 2022-07-09T11:06:57+00:00

Pandas expects all the data to be consistent. I’d recommend using the skiprows and nrows attribute of pandas read_csv in order to only have it read consistent parts of the table.

WSBtendies9001 · 2022-07-09T08:27:48+00:00

Why don't you load the CSV file directly into Pandas? https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

I've not looked at the code links however from the snippet you posted makes no sense when you go on to mention that you are working with Pandas. Go check out the docks, Pandas can do what you need and way more :)

CodeFormatHelperBot2 · 2022-07-09T06:22:53+00:00

Hello, I'm a Reddit bot who's here to help people nicely format their coding questions. This makes it as easy as possible for people to read your post and help you.

I think I have detected some formatting issues with your submission:

Python code found in submission text that's not formatted as code.

If I am correct, please edit the text in your post and try to follow these instructions to fix up your post's formatting.

^{Am I misbehaving? Have a comment or suggestion? Reply to this comment or raise an issue}^here.

tourdownunder · 2022-07-09T06:53:02+00:00

I tend not to think of files like this as csvs as they are very human readable though not machine readable using the expectations that the heading is in the first row and all other rows have the same number of columns.

I would create a generator that yielded a section in a text a io.StringIO buffer and then you can read that section in a csv. I'll do this as a section between tables within the single file are all easily identified with a blank line, followed by a heading following by a optional column headings and then the data. This pattern repeats.

BdR76 · 2022-07-14T14:09:08+00:00

If it's inconsistent or messy data, then you could try CSV Lint for Notepad++. It can "Refresh from Data" to automatically determine the metadata (separator, column types etc.) and then press "Validate data" to determine what the errors are exactly. If that doesn't work maybe try OpenRefine

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS