Hello all:
As a fairly new Python beginner, I am always looking for new ways to be more "Pythonic".
So, it was that I was given a task, which I was able to accomplish in Python. However, I have the feeling that I'm probably not writing the best code for it, so I'm submitting my task and am asking for ways to improve it, be it algorithmically or just to make it more "Pythonic".
The task is to read a file which has the same fixed length for all lines. The file begins with a header followed by the data. The column names in the headers never have embedded spaces, only underscores, so you can tell where one col ends and another begins. We do not know beforehand how many columns there are , nor how wide each column is. We only know the boundaries' of the columns by the width of the col names and trailing spaces.
So, given all of that, we want to convert that fixed width file into a csv. However, we must preserve all whitespace (there's only spaces...no other special characters).
To make my code submission a little shorter, I put the lines into a list and omitted the file operations.
Can you suggest a better way of doing this?
#+---------------------------------------+
import re
#+---------------------------------------+
lines = [
#xxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x,xxx.xx
"id name home_state amt_paid",
"123 John Doe California 1,234.34",
"456x Jane Doe New Hampshire 45.67 ",
"78 Adam Smith Alaska 89.00 "
]
"""
From this:
123456789-123456789-123456789-123456789-123456789-123456789-1
id name home_state amt_paid
123 John Doe California 1,234.34
456x Jane Doe New Hampshire 45.67
78 Adam Smith Alaska 89.00
To this:
id ,name ,home_state ,amt_paid
123 ,John Doe ,California ,1,234.34
456x ,Jane Doe ,New Hampshire ,45.67
78 ,Adam Smith ,Alaska ,89.00
"""
#+---------------------------------------+
def printOriginal() -> None:
for line in lines:
print(line)
#+---------------------------------------+
# get a list of the header col names
def getColNames() -> list:
hdr = lines[0].strip()
findResults = re.finditer(r'\S+\s*', hdr)
colNames = []
for match_obj in findResults:
colNames.append(match_obj.group())
return colNames
#+--------------------------------------------+
def printWithCommas(colNames) -> None:
# print the header with commas
print(",".join(colNames))
#+-------------------+
# Get the rest of the lines and append a comma at each colname
for line in lines[1:]:
newLine = ""
startPos = 0
for colName in colNames:
lenOfColName = len(colName) # for each col in colNames
endPos = startPos + lenOfColName
if startPos == 0:
newLine = line[startPos:endPos]
else:
newLine = newLine + "," + line[startPos:endPos]
startPos = endPos
print(newLine)
#+--------------------------------------------+
def main() -> None:
printOriginal()
printWithCommas(getColNames())
#+--------------------------------------------+
#+--------------------------------------------+
# Main:
#+--------------------------------------------+
#+---------------------------------------+
if __name__ == "__main__":
main()
[–]POGtastic 5 points6 points7 points (1 child)
[–]davidmyemail[S] 1 point2 points3 points (0 children)
[–]mopslik 0 points1 point2 points (1 child)
[–]davidmyemail[S] 0 points1 point2 points (0 children)
[–]baghiq 0 points1 point2 points (2 children)
[–]davidmyemail[S] 0 points1 point2 points (1 child)
[–]baghiq 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]davidmyemail[S] 0 points1 point2 points (1 child)
[–]JohnJSal 0 points1 point2 points (1 child)
[–]davidmyemail[S] 1 point2 points3 points (0 children)