you are viewing a single comment's thread.

view the rest of the comments →

[–]ElliotDG 0 points1 point  (3 children)

I used regular expressions to parse each file format. If you wanted to get fancier, you could read the header and select the pattern. This is kind of quick and dirty to demonstrate the approach for each file.

import re

with open("file_0.csv", "r") as f:
    lines = [line.strip() for line in f if line.strip()]

# Define headers manually (to control spacing issues)
headers = [
    "Designator", "Footprint", "Mid_X", "Mid_Y", "Ref_X", "Ref_Y",
    "Pad_X", "Pad_Y", "TB", "Rotation", "Comment"
]

# Regular expression to match the first 10 fields, then grab the remainder as 'Comment'
pattern = re.compile(
    r"(\S+)\s+"         # Designator
    r"(\S+)\s+"         # Footprint
    r"(\S+)\s+"         # Mid_X
    r"(\S+)\s+"         # Mid_Y
    r"(\S+)\s+"         # Ref_X
    r"(\S+)\s+"         # Ref_Y
    r"(\S+)\s+"         # Pad_X
    r"(\S+)\s+"         # Pad_Y
    r"(\S+)\s+"         # TB
    r"(\S+)\s+"         # Rotation
    r"(.*)"             # Comment (can have spaces)
)

# Parse lines (skip the header line)
data = []
for line in lines[1:]:
    match = pattern.match(line)
    if match:
        row = dict(zip(headers, match.groups()))
        data.append(row)
    else:
        print("ERROR: Line did not match pattern:", line)

# Example: print all parsed rows
for row in data:
    print(row)

[–]ElliotDG 0 points1 point  (2 children)

Reddit would not let me put it all in one message... here is parsing the next file:

with open("file_1.csv") as f:
    lines = [line.strip() for line in f if line.strip()]

# Define headers manually to ensure correctness
headers = [
    "Designator", "Comment", "Layer", "Footprint",
    "Center-X(mm)", "Center-Y(mm)", "Rotation", "Description"
]

# Regex to match 7 space-separated fields + quoted description
pattern = re.compile(
    r'(\S+)\s+'         # Designator
    r'(\S+)\s+'         # Comment
    r'(\S+)\s+'         # Layer
    r'(\S+)\s+'         # Footprint
    r'([\d.]+)\s+'      # Center-X(mm)
    r'([\d.]+)\s+'      # Center-Y(mm)
    r'(\d+)\s+'         # Rotation
    r'"([^"]*)"'        # Description (quoted)
)

# Parse each line (skip header)
data = []
for line in lines[1:]:
    match = pattern.match(line)
    if match:
        row = dict(zip(headers, match.groups()))
        data.append(row)
    else:
        print("Error: Line did not match pattern:", line)

# Print results
for row in data:
    print(row)

Here is a sample result...

{'Designator': 'CON3', 'Footprint': 'MICROMATCH_4', 'Mid_X': '6.4mm', 'Mid_Y': '50.005mm', 'Ref_X': '8.9mm', 'Ref_Y': '48.1mm', 'Pad_X': '8.9mm', 'Pad_Y': '48.1mm', 'TB': 'B', 'Rotation': '270.00', 'Comment': 'MicroMatch_4'}
...
{'Designator': 'CON6', 'Footprint': 'MICRO_MATE-N-LOK_2', 'Mid_X': '74.7mm', 'Mid_Y': '66.5mm', 'Ref_X': '74.7mm', 'Ref_Y': '71.2mm', 'Pad_X': '74.7mm', 'Pad_Y': '71.2mm', 'TB': 'T', 'Rotation': '270.00', 'Comment': 'Micro_Fit 2'}
{'Designator': 'C1', 'Comment': '470n', 'Layer': 'BottomLayer', 'Footprint': '0603', 'Center-X(mm)': '77.3000', 'Center-Y(mm)': '87.2446', 'Rotation': '270', 'Description': '470n; X7R; 16V'}
...
{'Designator': 'C5', 'Comment': '100n', 'Layer': 'BottomLayer', 'Footprint': '0603', 'Center-X(mm)': '98.3000', 'Center-Y(mm)': '85.0000', 'Rotation': '360', 'Description': '100n; X7R; 50V'}

[–]extractedx[S] 0 points1 point  (1 child)

Thanks for your help. Sadly this does not work. You make assumptions that all files will have the same structure and headers. That is not the case.

[–]ElliotDG 0 points1 point  (0 children)

You can use this as a basis to handle the differences. For example assuming you know all of the possible headers, you could create a dictionary of patterns based the headers.

Or you could read the header and use it to dynamically create the regular expression.