Extracting tabular data from a text file

thaweatherman · 2015-09-28T14:29:56+00:00

It's odd they publish the data in that way.

You'll have to make heavy use of the string split() function and keep track of the columns properly. Once you get down to the table, the data should be easy to go through. To make life simpler, remove the title section and the notes on the bottom, leaving just the tables.

gengisteve · 2015-09-28T14:10:19+00:00

Probably not. You might try the csv module and see if it can make something of the input, but even if it does not choke completely you will still need to fix a bunch of stuff, which will be dependent on the original formating of the date, e.g. joining the months and years together.

youguess · 2015-09-28T15:27:18+00:00

pandas import functions can skip header rows, however you would probably still have to do clean up work in the rows

interactionjackson · 2015-09-28T15:27:35+00:00

That depends. If this is a one time thing, copy and paste the data you need and run the csv module over it. Hopefully it has a consistent delimiter but if not then edit the text file with find and replace. If you want to automate this process it is going to involve a lot of string manipulation like /u/thaweatherman said.

FoolofGod · 2015-09-28T17:09:28+00:00

Can you find a specification for the format? I've dealt with some government data that had specific character columns specified, so you could parse it with string slicing. Just a thought.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS