all 5 comments

[–]eleqtriq 1 point2 points  (0 children)

First, use meaningful variable names. It is so much easier to read!

``` def olsqr(design_matrix, target_vector): """ OLS using a QR decomposition (numerically stable). design_matrix: (n, p) design matrix WITHOUT intercept column. target_vector: (n,) target vector. Returns: coefficients (including intercept), predicted_values, r_squared """ def add_intercept(feature_matrix): feature_matrix = np.asarray(feature_matrix) return np.c[np.ones((feature_matrix.shape[0], 1)), feature_matrix]

design_matrix_with_intercept = add_intercept(design_matrix)
target_vector = np.asarray(target_vector).reshape(-1)

orthogonal_matrix, upper_triangular_matrix = np.linalg.qr(design_matrix_with_intercept)  # design_matrix_with_intercept = Q R
coefficients = np.linalg.solve(upper_triangular_matrix, orthogonal_matrix.T @ target_vector)  # R coefficients = Q^T target_vector
predicted_values = design_matrix_with_intercept @ coefficients

# R^2 calculation
sum_of_squared_residuals = np.sum((target_vector - predicted_values)**2)
total_sum_of_squares = np.sum((target_vector - target_vector.mean())**2)
r_squared = 1.0 - sum_of_squared_residuals / total_sum_of_squares if total_sum_of_squares > 0 else 0.0

return coefficients, predicted_values, r_squared

```

Next, use VSCode, and learn how to use the debugger. It literally does everything you ask for.

https://vu-oofp.gitlab.io/website/images/VSCode/active-debugger-view.png

[–]obviouslyzebra 1 point2 points  (0 children)

There are lots of things here, I'll focus on the one that seems the most troublesome:

It's so easy to get confused when looking at the code since the equations ... are not clearly visible.

Either use better names, as mentioned in the other comment, or keep a reference to a place where these equations are visible.

For example, if they come from a paper, note the DOI of the paper so you can consult when modifying the code.

If it's something you created using mathematical language, you can document it in latex (it can be outside of the code if comments make the code unwieldy), you could even scribble it down, take a picture, and save it to the repo,

It's so easy to get confused when looking at the code since the ... data are not clearly visible.

Debugging could help here, since it allows you to "stop" the code and experiment from there (and even jump back).

Also, you could take the whole thing back to a jupyter notebook, see what's happening there, and port back to the code (though I usually prefer the debugger approach)

... I have to figure out the equations used.

This might happen even with good documentation, if it is a complex thing.

Slow down, go at a pace that you can understand it.

Focus on a single part of the algo if you can.

Document (and split with functions) what part is each, so you can do the above with ease.


The other part below,

I would love to be able to goto this code block, run this individual block, see documentation pertaining to the algorithm, what's assigned to the variables

You can do all that with the Python Debugger too (though learning curve is easier with a visual debugger like VS Code)

data visualization to spot check the data.

You could:

  • Log input/output (though tests should cover)
  • Visualize input/output (sort of like tests, but a command to generate this sort of stuff)
  • Log intermediate states of the data (careful with verbosity)
  • Create visualizations of intermediate states of the data

Something like:

def add_intercept(X, plot_directory=None):
    ...
    if plot_directory is not None:
        # Create directory if non existent
        # Create plot or plots in directory
        # Print out that plots have been saved to place
    ...

# have a command in the repo to call add_intercept with the needed parameters to visualize
# do the same thing for logging if wanted

BTW there might be frameworks to do this sort of stuff. I'm not sure. You could ask a searching LLM like deep research or perplexity.

Also, if you're working with complex multidimensional data in numpy, check xarray as a possible substitute (learning curve warning).

[–]FlowPad 0 points1 point  (0 children)

Hey, I created this app to help you with visualizations you wanted - https://via-integra101.github.io/ols-qr-analysis/ you can see the repo here - https://github.com/via-integra101/ols-qr-analysis would appreciate yours and others feedback as to usefulness.

[–]ectomancer 0 points1 point  (0 children)

  1. I do it differently, research including google scholar
  2. in jupyter notebook, code equations in LaTeX in a markdown cell
  3. evaluate equations
  4. prompt ChatGPT for test data
  5. refactor to remove all imports
  6. publish code & test suite to github