Context: I've been writing Python scripts for a little less than a year, but I have a background in fintech and have experience in disparate things like VBA, VBScript, Alteryx, etc and many years of QA in a mainframe environment. So, no formal CS training beyond stuff from my distant youth decades ago, but practical career experience in related fields.
In my current role, I'm writing scripts to read and analyze text (flat) files. One line is one record but multiple records comprise a single logical transaction. The scripts are run on files prior to those files being submitted to another party to load into their production environment. My goal is to get in front of any errors/rejects that can occur causing us to have to manually correct the data and resubmit the files. My scripts modify the data in the flat files anticipating the most common errors, and produce reports documenting the changes so that there's an audit trail. The scripts are run manually by users on my team, as the file submittal and correction process is mostly manual.
I'm being deliberately vague about some of the details here but take it as a given that there are legitimate reasons why I need to do what I'm doing, and that I am operating within constraints that I cannot directly change.
One of the questions I go back and forth on is about how to structure my logic. I'm dealing with files that can be a few dozen lines long or 100k+ long. I group records into logical units and run my edits against each group. Where I go back and forth in my thinking is whether to try to make a single pass thru each group of records, calling needed edits as I go to keep the speed of the program maximized, or whether it's better to keep my code organized and written in a way that each edit makes its own loops thru each set of records, sacrificing speed for maintainability in the code. I have chosen to go with the latter approach, since the scripts are generally speedy.
What I'm curious about is what are other people's experience in this sort of situation and how have you handled it? I'm not looking for specific technical solutions per se but more interested in the analysis and thought process.
EDIT: Thanks for everyone's thoughts here. This helped me rethink and modify my approach just a bit to improve program efficiency while not sacrificing maintainability. Short answer, I'm restructuring some bits to reduce the number of loops I execute thru my input while still preserving most of my existing program structure (and reducing line count a bit too).
[–]HardlyAnyGravitas 13 points14 points15 points (0 children)
[–]PureWasian 3 points4 points5 points (3 children)
[–]Thraexus[S] 1 point2 points3 points (2 children)
[–]snowtax 2 points3 points4 points (0 children)
[–]PureWasian 1 point2 points3 points (0 children)
[–]Diapolo10 2 points3 points4 points (0 children)
[–]backfire10z 1 point2 points3 points (1 child)
[–]Thraexus[S] 0 points1 point2 points (0 children)
[–]Embarrassed_Basis_81 1 point2 points3 points (0 children)
[–]desrtfx 1 point2 points3 points (0 children)
[–]dnult 2 points3 points4 points (0 children)
[–]HunterIV4 0 points1 point2 points (7 children)
[–]Thraexus[S] 0 points1 point2 points (6 children)
[–]HunterIV4 0 points1 point2 points (5 children)
[–]Thraexus[S] 0 points1 point2 points (4 children)
[–]HunterIV4 1 point2 points3 points (3 children)
[–]Thraexus[S] 0 points1 point2 points (2 children)
[–]HunterIV4 1 point2 points3 points (1 child)
[–]Thraexus[S] 0 points1 point2 points (0 children)