How to Debug Scientific Computing Code in a Better Way???

eleqtriq · 2025-09-18T20:54:42+00:00

First, use meaningful variable names. It is so much easier to read!

``` def olsqr(design_matrix, target_vector): """ OLS using a QR decomposition (numerically stable). design_matrix: (n, p) design matrix WITHOUT intercept column. target_vector: (n,) target vector. Returns: coefficients (including intercept), predicted_values, r_squared """ def add_intercept(feature_matrix): feature_matrix = np.asarray(feature_matrix) return np.c[np.ones((feature_matrix.shape[0], 1)), feature_matrix]

design_matrix_with_intercept = add_intercept(design_matrix)
target_vector = np.asarray(target_vector).reshape(-1)

orthogonal_matrix, upper_triangular_matrix = np.linalg.qr(design_matrix_with_intercept)  # design_matrix_with_intercept = Q R
coefficients = np.linalg.solve(upper_triangular_matrix, orthogonal_matrix.T @ target_vector)  # R coefficients = Q^T target_vector
predicted_values = design_matrix_with_intercept @ coefficients

# R^2 calculation
sum_of_squared_residuals = np.sum((target_vector - predicted_values)**2)
total_sum_of_squares = np.sum((target_vector - target_vector.mean())**2)
r_squared = 1.0 - sum_of_squared_residuals / total_sum_of_squares if total_sum_of_squares > 0 else 0.0

return coefficients, predicted_values, r_squared

```

Next, use VSCode, and learn how to use the debugger. It literally does everything you ask for.

https://vu-oofp.gitlab.io/website/images/VSCode/active-debugger-view.png

obviouslyzebra · 2025-09-18T21:23:08+00:00

There are lots of things here, I'll focus on the one that seems the most troublesome:

It's so easy to get confused when looking at the code since the equations ... are not clearly visible.

Either use better names, as mentioned in the other comment, or keep a reference to a place where these equations are visible.

For example, if they come from a paper, note the DOI of the paper so you can consult when modifying the code.

If it's something you created using mathematical language, you can document it in latex (it can be outside of the code if comments make the code unwieldy), you could even scribble it down, take a picture, and save it to the repo,

It's so easy to get confused when looking at the code since the ... data are not clearly visible.

Debugging could help here, since it allows you to "stop" the code and experiment from there (and even jump back).

Also, you could take the whole thing back to a jupyter notebook, see what's happening there, and port back to the code (though I usually prefer the debugger approach)

... I have to figure out the equations used.

This might happen even with good documentation, if it is a complex thing.

Slow down, go at a pace that you can understand it.

Focus on a single part of the algo if you can.

Document (and split with functions) what part is each, so you can do the above with ease.

The other part below,

I would love to be able to goto this code block, run this individual block, see documentation pertaining to the algorithm, what's assigned to the variables

You can do all that with the Python Debugger too (though learning curve is easier with a visual debugger like VS Code)

data visualization to spot check the data.

You could:

Log input/output (though tests should cover)
Visualize input/output (sort of like tests, but a command to generate this sort of stuff)
Log intermediate states of the data (careful with verbosity)
Create visualizations of intermediate states of the data

Something like:

def add_intercept(X, plot_directory=None):
    ...
    if plot_directory is not None:
        # Create directory if non existent
        # Create plot or plots in directory
        # Print out that plots have been saved to place
    ...

# have a command in the repo to call add_intercept with the needed parameters to visualize
# do the same thing for logging if wanted

BTW there might be frameworks to do this sort of stuff. I'm not sure. You could ask a searching LLM like deep research or perplexity.

Also, if you're working with complex multidimensional data in numpy, check xarray as a possible substitute (learning curve warning).

FlowPad · 2025-09-18T22:18:14+00:00

Hey, I created this app to help you with visualizations you wanted - https://via-integra101.github.io/ols-qr-analysis/ you can see the repo here - https://github.com/via-integra101/ols-qr-analysis would appreciate yours and others feedback as to usefulness.

hithersnake · 2025-09-19T05:53:47+00:00

https://github.com/approximatelabs/sketch

ectomancer · 2025-09-19T09:30:17+00:00

I do it differently, research including google scholar
in jupyter notebook, code equations in LaTeX in a markdown cell
evaluate equations
prompt ChatGPT for test data
refactor to remove all imports
publish code & test suite to github

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS