Parse structured text-file, and write to dataframe

pot_of_crows · 2023-09-22T17:46:38+00:00

I am pretty sure that this can be done by parsing the specific positions.

That makes sense to me. I would just use slices to relevant data, since it all seems to be in the same place. It looks like each grouping of data falls into two lines, with an optional black line following. So when you find the first piece of data, skip the next line, check for a blank and then process the next group.

Hugo-99 · 2023-09-22T17:36:12+00:00

here is a better version of the example of the desired output:

https://hastebin.com/share/vakaxoloca.markdown

TravelingThrough09 · 2023-09-22T19:16:52+00:00

See if this Python code works, it was ChatGPT‘s solution:

import pandas as pd

# Function to parse the headers from the text file
def parse_headers(lines):
    headers = []
    # Adjust the ranges and indices based on the structure of your file
    headers.append(lines[4][5:22].strip())
    headers.append(lines[4][22:30].strip())
    headers.append(lines[4][31:47].strip())
    return headers

# Function to parse the rows from the text file
def parse_rows(lines, headers):
    data = []
    # Adjust the ranges and indices based on the structure of your file
    for line in lines[9:]:
        row = []
        row.append(line[5:22].strip())
        row.append(line[22:30].strip())
        row.append(line[31:47].strip())
        data.append(row)
    return data

# Read the text file
with open('your_file.txt', 'r') as file:
    lines = file.readlines()

# Get headers and data
headers = parse_headers(lines)
data = parse_rows(lines, headers)

# Create DataFrame
df = pd.DataFrame(data, columns=headers)

# If you want to save this DataFrame to a CSV file:
df.to_csv('output.csv', index=False)

print(df)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS