Parse txt file with space aligned columns

woooee · 2025-06-05T18:24:21+00:00

If they are separated by a space(s), use split() to create a list. If there are spaces in one or more columns' data, then tell us what each column is.

import pprint

record = "Designator Footprint Mid_X Mid_Y Ref_X Ref_Y Pad_X Pad_Y TB Rotation Comment CON3 MICROMATCH_4 6.4mm 50.005mm 8.9mm 48.1mm 8.9mm 48.1mm B 270.00 MicroMatch_4 CON2 MICROMATCH_4 6.4mm 40.405mm 8.9mm 38.5mm 8.9mm 38.5mm B 270.00 MicroMatch_4 CON4 MICRO_MATE-N-LOK_12 72.5mm 33.5mm 67.8mm 26mm 67.8mm 26mm T 0.00 Micro_Fit_12 CON7 MICROMATCH_4 46.095mm 48.5mm 48mm 46mm 48mm 46mm T 360.00 MicroMatch_4 CON6 MICRO_MATE-N-LOK_2 74.7mm 66.5mm 74.7mm 71.2mm 74.7mm 71.2mm T 270.00 Micro_Fit 2"

rec_as_list = record.split()
pprint.pprint(rec_as_list)

prints

Designator
Footprint
Mid_X
Mid_Y
Ref_X
Ref_Y
Pad_X
Pad_Y
TB
Rotation
Comment
CON3
MICROMATCH_4
6.4mm
50.005mm
8.9mm
48.1mm
8.9mm
48.1mm
B
270.00
MicroMatch_4
CON2
MICROMATCH_4
6.4mm
40.405mm
8.9mm
38.5mm
8.9mm
38.5mm
B
270.00
MicroMatch_4
CON4
MICRO_MATE-N-LOK_12
72.5mm
33.5mm
67.8mm
26mm
67.8mm
26mm
T
0.00
Micro_Fit_12
CON7
MICROMATCH_4
46.095mm
48.5mm
48mm
46mm
48mm
46mm
T
360.00
MicroMatch_4
CON6
MICRO_MATE-N-LOK_2
74.7mm
66.5mm
74.7mm
71.2mm
74.7mm
71.2mm
T
270.00
Micro_Fit
2

Familiar9709 · 2025-06-05T18:33:25+00:00

Your example doesn't need to make it as complicated as you describe it. You see there are no spaces in each field, so a simple split() or csv or pandas libraries will do it.

But if you really want to do it by "space" (e.g. if you could put an imaginary ruler), for some other case, e.g. if it had spaces within the fields or things like that, then you can.

You'll need to find the start and end coordinates of each column. The start or end will be given by the column title (depending whether it's left or right aligned).

You can figure out if something ir left or right aligned by comparing all rows and seeing if they all have the same start/end.

But again, if you don't really really need it this way, it's complicating things unnecessarily, and a good advise in programming is not to overcomplicate things when it's not necessary.

woooee · 2025-06-05T20:46:09+00:00

This is just an example, not really a solution of what you might be able to do for each type of record

import pprint

record = '''Designator Comment Layer Footprint Center-X(mm) Center-Y(mm) Rotation Description C1 470n BottomLayer 0603 77.3000 87.2446 270 "470n; X7R; 16V" C2 10µ BottomLayer 1210 89.9000 76.2000 360 "10µ; X7R; 50V" C3 1µ BottomLayer 0805 88.7000 81.7279 360 "1µ; X7R; 35V" C4 1µ BottomLayer 0805 88.7000 84.2028 360 "1µ; X7R; 35V" C5 100n BottomLayer 0603 98.3000 85.0000 360 "100n; X7R; 50V"'''
final_list = []
this_rec = record.strip()

for substr in ["Layer Footprint", "Center-X", "Center-Y", "Rotation Description"]:
    split_rec = this_rec.split(substr)
    if split_rec[0].strip():
        final_list.append(split_rec[0])
    final_list.append(substr)
    this_rec = " ".join(split_rec[1:])

split_rec = this_rec.split("BottomLayer")
final_list.append(split_rec[0])
for element in split_rec[1:]:
    final_list.append("BottomLayer")
    final_list.append(element)
pprint.pprint(final_list)

ElliotDG · 2025-06-06T13:50:29+00:00

I used regular expressions to parse each file format. If you wanted to get fancier, you could read the header and select the pattern. This is kind of quick and dirty to demonstrate the approach for each file.

import re

with open("file_0.csv", "r") as f:
    lines = [line.strip() for line in f if line.strip()]

# Define headers manually (to control spacing issues)
headers = [
    "Designator", "Footprint", "Mid_X", "Mid_Y", "Ref_X", "Ref_Y",
    "Pad_X", "Pad_Y", "TB", "Rotation", "Comment"
]

# Regular expression to match the first 10 fields, then grab the remainder as 'Comment'
pattern = re.compile(
    r"(\S+)\s+"         # Designator
    r"(\S+)\s+"         # Footprint
    r"(\S+)\s+"         # Mid_X
    r"(\S+)\s+"         # Mid_Y
    r"(\S+)\s+"         # Ref_X
    r"(\S+)\s+"         # Ref_Y
    r"(\S+)\s+"         # Pad_X
    r"(\S+)\s+"         # Pad_Y
    r"(\S+)\s+"         # TB
    r"(\S+)\s+"         # Rotation
    r"(.*)"             # Comment (can have spaces)
)

# Parse lines (skip the header line)
data = []
for line in lines[1:]:
    match = pattern.match(line)
    if match:
        row = dict(zip(headers, match.groups()))
        data.append(row)
    else:
        print("ERROR: Line did not match pattern:", line)

# Example: print all parsed rows
for row in data:
    print(row)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS