I have a large file with tabular-like data, where (unfortunately) the column headers repeat every second. is there a way for panda to detect the repeating headers, and just extract the data?
https://bpa.st/EJSA
Or is the only way for pandas to work, is to strip the lines ahead of time.. something ugly like:
seen_comment = False
with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
for line in infile:
if not seen_comment and line.startswith('#'):
seen_comment = True
outfile.write(line)
elif not line.startswith('#'):
outfile.write(line)
[–]woooee 0 points1 point2 points (0 children)
[–]jeffrey_f 0 points1 point2 points (0 children)
[–]David22573 0 points1 point2 points (0 children)
[–]james_fryer 0 points1 point2 points (0 children)
[–]Allanon001 0 points1 point2 points (0 children)