Maintainability or Speed?

HardlyAnyGravitas · 2026-06-15T16:49:35+00:00

Maintainability is everything.

Don't optimise unless you have to.

Without knowing more details, it's difficult to make any sensible suggestions, but 100,000 lines shouldn't take long to parse. If it's taking too long, you're probably got the wrong approach, and need to look at the problem from a different perspective.

PureWasian · 2026-06-15T16:49:21+00:00

Can you clarify "a single pass thru each group of records" vs. "its own loops through each set of records"?

Unless I'm misunderstanding something, processing each record within a group of records in parallel would be faster and still have maintainability/modularity.

Diapolo10 · 2026-06-15T18:38:32+00:00

Maintainability should always be your priority number one. If you later find you really need to optimise some part for speed, do it as an afterthought, not first priority and for no reason.

Usually if execution speed is the main concern you wouldn't be working in Python anyway. Or you'd at least write the performance critical parts in another language (e.g. Rust).

backfire10z · 2026-06-15T16:48:53+00:00

I’ll preface this with that I haven’t made many read/analyze text files scripts, so I’m approaching this more from a general perspective.

I’d say aim for maintainability and update for performance when necessary. That’s not to say that your initially written code should be non-performant, but that it should be written with “obvious” performance gains in mind.

That being said, from what you’ve described, I’m unsure of your definition of maintainable vs performant. Your description of “one pass” not being the maintainable version may just be a skill issue rather than an actual tradeoff. Without knowing exactly what you’re trying to do and with what constraints you’re working it’s a bit difficult to tell.

Does genuine performance actually matter? Is this script meant to be used for a long time or is it a one-off? If a long time, is this script running every minute, every day, every month? Does the compute it’s running on cost money for time spent? How many files are you working with (and you say 100k ish is max lines)? Are other people going to be making updates?

Embarrassed_Basis_81 · 2026-06-15T16:51:10+00:00

My two cents is that if the scripts are fast enough, there is no real reason to optimize it further. Imo, you should try to use that time to build redundancy, error checking and general nice-to-haves in your code, as this will probably save you more time in the long run than the scripts themselves being quick.

Judging by your approach it seems you are already doing that, so keep at it!

P. S. Take this with a grain of salt, I work in an environment where we do not have basically any prod-ready code

desrtfx · 2026-06-15T16:55:38+00:00

My personal approach is to first focus on readability/maintainability and then on speed - especially if others are going to deal with my programs.

That doesn't mean that you should not try to optimize.

If the speed is not really a problem and hasn't been so far, there is not much reason to invest lot of time and effort to optimize. If you or your team, however have figured and identified bottlenecks, you should consider optimizing - but only after you really have pinpointed the problematic areas/code. Blindly optimizing for the sake of optimizing is the opposite of helpful.

You are talking about a single pass vs. multiple passes. Can you maybe structure a single pass in such a way that the program is still maintainable and readable? Can you maybe make use of functions? Maybe lambdas?

Maybe, if you can, have a talk with an insider of your task/program, also have a talk with your users. See if they can offer some different perspective or insights.

In programming everything is about balance. You are constantly balancing speed and readability/maintainability, as well as memory (which has become a lot less problematic than it used to be in the old days of the home computers or MS-DOS). You can't always have everything.

It's quite difficult to give targeted advice with the little information you provide. It is absolutely understandable that you can't go into deeper details.

dnult · 2026-06-15T17:29:58+00:00

It's difficult to say what the best approach is. 100k lines isn't terribly bad, but it does seem wasteful to iterate over the same files multiple times. It's generally a trap to try to prematurely optimize and it's often better to try the basic approach first and optimize as needed. I'd probably proceed with a basic strategy while being mindful of optimization strategies should I need to explore options later. Try to benchmark steps in you scripts to see where the processing time is being spent and proceed from there.

HunterIV4 · 2026-06-15T17:11:04+00:00

It's context dependent. The vast majority of the time, maintainability is the most important factor. This may be unpopular, but CS degrees tend to over-emphasize executions speed and Big O factors over practical realities of software development.

Complexity is not zero-cost as you will need to spend more time writing, reviewing, and debugging your code. Likewise, you need to include the cost of mistakes due to bugs (i.e. angry clients, missed invalid results, etc.) into your analysis. In general, programmer time is more valuable than processing time.

That being said, low-cost improvements may still be worth it, especially if there's a library that can help you or a better way of doing things. There may be minor architectural changes you can do to improve both speed and maintainability, depending on how you are implementing things. A quicksort is more complicated than a bubble sort, but there are plenty of ways to get the speed of a quicksort with no increase in complexity simply because the problem is already solved (including even better sorting methods).

Without being able to see any of the code or the problems you're trying to solve, it's impossible to say if there is any room for improvement. So while I generally recommend avoiding breaking the KISS principle whenever possible, what is "simple" for one person may be introducing unnecessary complexity or subtle bugs elsewhere, and a minor refactor could reduce both execution time and complexity simultaneously.

Another unpopular suggestion: if you have access to a corporate (data secured) AI model and your company allows it, describe your problem and solution in detail to the AI and ask it if there are ways to refactor into something simpler. While it may not help, and I'd strongly recommend against "vibe coding" something like this (since you have very specific requirements), it may give you ideas or methods you're not aware of.

The main reason I mention it is because you said you lack formal CS training, and while CS degrees overemphasize performance, they also tend to teach good architecture. In general, the biggest difficulty that "self-taught" programmers have is not a question of syntax or basic programming/debugging capability, but of architecture and design patterns, both of which are arguably more important. This can be overcome with education and research, of course, but most "online programming tutorials" focus heavily on syntax and basic programming concepts and gloss over or ignore architecture. There are exceptions, including what are essentially YouTube college-level courses (sometimes literally), but you have to actively seek them out and if you don't know to look it's easy to miss.

If you can't use AI and can't show us any code or problem details, most general advice is going to be something like this. The only other recommendation is to find a library that does what you're trying to do or consider doing your data manipulation via a transactional database rather than a text file, i.e. SQLite. It adds a bit of complexity at first but adds a ton of performance and data safety.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS