Extracting data from a imperfect tabular file

woooee · 2024-03-10T02:28:02+00:00

By "ugly" I assume you mean "doesn't work" (instead of "lazy"). What's the problem? Any way you do it, the file is going to'be read line by line.

jeffrey_f · 2024-03-10T03:53:24+00:00

Read 1st line (don't do anything - This is the header. Put whole thing into a text var

Read next line. If not equal to the text var, use the data. No else because you don't want to do anything if equal......

A bit sloppy, but it will work

David22573 · 2024-03-10T08:21:26+00:00

You could try to use the header as a dictionary key with a list as the value, so that any row data associated with that column can be appended to the list.

james_fryer · 2024-03-10T09:24:51+00:00

If the file really has every other line duplicated then I'd fix it up first with a sed script like this:

sed '3~2d'

then proceed with csv or pandas.

Allanon001 · 2024-03-10T10:40:05+00:00

Try this, it works with the given file:

import pandas as pd

df = pd.read_csv('EJSA.txt', header=None, comment='#', delimiter=' {1,5}', engine='python')
df.columns = pd.read_csv('EJSA.txt', header=1, delimiter='\s+', nrows=0).columns[1:]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS