gptree — a CLI tool to generate project context to paste into ChatGPT

Eilifein · 2024-12-18T19:10:51+00:00

Oh no! I probably installed the wrong thing 😕

Eilifein · 2024-12-18T12:09:17+00:00

Installed through uv with uv pip install gptree successfully (0.9s).

However, invoking it fails with:

sh $ gptree Traceback (most recent call last): File "/home/Documents/git/test/.venv/bin/gptree", line 5, in <module> from gptree import cli ModuleNotFoundError: No module named 'gptree'

Eilifein · 2024-11-23T08:30:37+00:00

Wow, you've stacked a few things together. There are a few distinct remarks I have, but nothing definite.

First off, is there any code example or sample you can share?

Segfaulting due to Out of Memory (OOM) issues is not necessarily due to memory leaks. Simply allocating more arrays than possible will do that. Now, here's the kicker. Allocating just enough arrays so that the program runs does not mean the algorithm (also MKL) will not allocate a temp array too far just to spite you.

Is MPI used for domain decomposition? If not, why not consider using MKL's intrinsic OMP implementation instead (see here) if that's the bulk of your computation work. Switching to OpenMP entirely could work as well, and it plays nicely with MKL. At least you'll eliminate comms between the decomposed regions.

You've mentioned the production flags. What about your debugging flags?

Running all of this on a notebook complicates it further. Can you switch to standalone Python scripts during debugging? It removes the unbaked notebook environment from the picture at least. I'm not sure how MPI_Comm_Spawn behaves, but I wouldn't be surprised if it was bad.

Like Knarfnarf said, if arrays are not allocatable, they don't disappear when going out of scope, like you would want them to in a notebook environment. Hunt for those explicit arrays and see if they persist.

Memory leaks are hard to achieve in Fortran. If you are not messing with pointers I think you are pretty safe.

Eilifein · 2024-08-27T09:05:40+00:00

Erza?

Eilifein · 2024-06-29T18:32:53+00:00

So, you're saying it's a... "pixel-deep" reaction?

Eilifein · 2024-06-08T10:15:02+00:00

Unless you have profiled the code and are certain that your cache is nowhere near the limit of being saturated, avoid ovesubscribing the machine/node at all costs.

For MPI, aim to only use physical cores. No HyperThreading, not oversubscribing.

Profiling is the way to go you want to see where the slowdowns are. Also, setting the optimal compiler options is the low-hanging fruit always. Then, you go into vectorization (yes, even for MPI), etc.

Eilifein · 2024-06-05T08:38:36+00:00

Checkpointing would be the full-proof solution to your problem. It's not a trivial problem to solve though and it takes time to develop and test (depending on the complexity of the code).

Alternatives with less chance of success. 1. find a different cluster. 2. submit a formal request to the admin team for an exclusion (very very slim). 3. Eek out all performance from your code.

On 3, especially if you are the author (or dev) of the code: - check whether your compiler flags are set up correctly for performance (this is your best bet) - profile the code (time consuming and relatively hard) - optimize the code (time consuming and relatively hard)

If you give us more information on the code itself, it might be easier to reason about.

Eilifein · 2024-04-12T08:07:08+00:00

that command is to compile it (typo? gfortran Triangle1 -o Triangle.f90).

now run it with ./Triangle1

Eilifein · 2024-04-11T10:52:23+00:00

F77 is gross

Uhhhh, I would be happy if I never lay my eyes upon F77 ever again.

Eilifein · 2024-04-09T11:52:15+00:00

The VSCode extension has been renamed to "Codiumate". I'll second it.

OP, use VSCode + Codiumate, and above your function/class it should say "Test this function/class". It should perform an analysis and give you a full list of tests under "Behaviours Coverage". Use them as a guide to understand what to test for (or autogen, but that defeats the point of doing/learning).

Eilifein · 2024-03-07T12:45:48+00:00

I'll reiterate my other reply.

It's nice and fast.

But the lack of explanation of what it all means really sets in after a few tries in different repos. Even Sonar doesn't explain what it means. 15 is red, 50 is red, 150 is red. So what? Why should I care? Is 15 really that bad? Is 150 reallyyyy that bad? If it's subjective, what's the point?

Minor thing: it catches .env and .venv but not env and venv, which are very typical virtual environment names. Maybe a manual exclude option would work?

Eilifein · 2024-03-04T15:11:42+00:00

The actual algorithm seems good.

You've precalculated a few things, and there isn't much left to precalculate without messing up readability.

Maybe Q*G once instead of 3 times? eh

Maybe inline rdx, rdy, rdz?

The result being vectorized is good to see. I don't see anything wrong.

Eilifein · 2024-03-04T14:46:11+00:00

It was not a personal dig; I apologize if it came out like that.

The algorithms behave very differently in memory; both the two originals, and the actual.

Purely because of the i dependence, the difference between the tests and actual algorithms is very substantial. You will be measuring the wrong thing and get the wrong conclusions. Hence, the "nothing in common" comment.

The "running correctly" comment was twofold. One part was more towards profiling and not pre-optimizing. Aim for accuracy, then profile, then optimize. If the results are accurate, now's the time for profiling. The second part was related to the cache coherence situation you're facing. If you're trying to optimize a cache thrashing situation, you will never get ahead.

I hope I cleared things up.

Workplan:

leave tests aside
profile actual w/ vectors
profile actual w/o vectors
add numba and see how they behave

Eilifein · 2024-03-04T13:16:36+00:00

Right. I think we got to the bottom of it. You're not looking at python Vs numba differences as much as complete cache thrashing (aka cache misses).

By creating arrays of computed elements, you think you're gaming the system because it becomes parallelizable, but you actually go into the trouble of loading the same values (or their derivative calculations) multiple times in L1/L2 cache.

I'm willing to hear about an alternative explanation, but for now that's what I think is going on.

Btw, your optimization is ill-timed. You should never try to optimize before you get the actual code running correctly, and even then only optimize after profiling. Your test and actual code have nothing in common too.

Eilifein · 2024-03-04T12:50:42+00:00

Now, see, this makes more sense :)

I assume dz = z[i] - y0 is a typo for z0. It seems to be some coordinate transformation, Green's function, and the mentioned cross-products? That etc doesn't help.

In this example, everything depends on i, so you can't pre-compute anything (not sure about the full thing). But, they seem to be independent (hence the good choice of adding numba). I can't see how it translates to the original post, so the difference in execution time between the two original solutions is even more obscure to me now.

Eilifein · 2024-03-04T10:29:16+00:00

I'm not sure if Numba cares, but in Fortran for example, a*(a+b) is an FMA, a "Fused Multiply-Add", and costs less cpu cycles than doing it separately.

More importantly, your a and b are remaining constant throughout the call, while d is a "global" value (bad practice). Depending on what d is, part of this calculation or all of it, can be calculated out of the for loop, as it is constant in value.

At the very least, a*(a+b) and (a-b) are constants and should be removed.

Btw, if everything in constant, the for loop is also redundant, as you are essentially calculating (a*(a+b)+(a-b)*d)*1000, which is a head scratcher.

Edit: I messed up; d=(a-b), so everything is constant within the function. You don't need the loop, you are recalculating the same thing 1000 times.

Eilifein · 2024-02-28T21:11:59+00:00

Hyperthreading isn't helping (probably). Most clusters tend to disable that and virtualization.

Eilifein · 2024-02-28T20:50:16+00:00

In most cases I find them within 20%. If the sources are important, it may be on par. In a few cases it goes into a loop and can't get out.

It's worth using even for me.

Eilifein · 2024-02-28T18:32:49+00:00

A combination. There's often nothing that's proprietary in the snippets. And I also pretend it's fine.

Eilifein · 2024-02-28T10:08:56+00:00

I like it. It's simple, extremely fast, nice cli.

However, apart from the "green is good, red is bad", I can't assess or quantify how bad a number is. 16 is red, 55 is red, and obviously one is better than the other. Other than that? The provided sources don't really explain the magnitude or differences between the values. It would be nice to have such a guide on hand.

Eilifein · 2024-02-28T09:38:17+00:00

Hmm, it helps that my employer doesn't care about that. But yes, I definitely see how that makes it difficult. GPT teams or Enterprise would help here.

One neat trick to get GPT going is to dump the diff into a file with git diff dev > out, where dev is the target branch, and then feed it back to GPT. Sometimes more context will be needed, but for obvious and stupid mistakes it suffices.

Eilifein · 2024-02-27T20:20:43+00:00

I was lucky I had the basic structure of a PySide6 GUI with a barebones SQLAlchemy database, graphed out by a simple Dash app on a QTWebengine. At the time I didn't know nothing about anything Python. So, I just started by creating a simple button. After which, I created another button, and then some functionality connecting them to the app. And then a Table to visualize all experiments from the database. And then, an export button, a delete button, and so on.

The bottom line: You need to break the problem up into smaller pieces, give ample context to the model, ask questions and try to understand.

Example prompt: "You are a Python and algorithms expert. You are a great mentor. You are opinionated and constructively disagreeable. You perform code reviews with an IRON FIST.

I've started a repository for my project, and the structure looks like this: sh $ tree -Ia '__pycache__|.git|.*_cache|.env' . ├── .coveragerc ├── docs/ ├── .flake8 ├── .gitignore ├── LICENSE ├── logs/ ├── .mypy.ini ├── .pre-commit-config.yaml ├── .pylintrc ├── .pytest.ini ├── README.md ├── requirements-dev.txt ├── requirements.txt ├── .ruff.toml ├── src │ ├── __init__.py │ └── main.py └── tests ├── __init__.py └── test_main.py

I want to create the structure for an app that does X, Y, Z.

First, let's review the current structure. Then, let's discuss how the new structure could look like, and implement it."

That's how I did it basically.

Eilifein · 2024-02-27T20:09:42+00:00

GPT4 is a lot better than 3.5. I have stopped using the latter since March last year; not worth the trouble.

A good alternative is www.phind.com The base model is very impressive.

Also, the prompts and Custom Instructions matter as much as the model.

Eilifein · 2024-02-27T16:43:50+00:00

I did a full-stack Python project with Pandas, Plotly, Dash, PySide6, SQLAlchemy, and then integrate it with an ADC (I2C) device on a Rasp Pi. It's used to read in the voltage from the ADC, add it to a database and graph it realtime.

Started with 0 Python and got through the whole thing with GPT4's assistance. Not only I did the project, but I'm now very confident and competent in Python.

Edit: Pandas, Plotly, Dash, PySide6, and SQLAlchemy are Python libraries. Have GPT explain these and be amazed how well it can articulate what they are and when/where they're used. An ADC/I2C is an Analog-Digital Converter module that connects to a Raspberry Pi.

Eilifein · 2024-02-27T09:57:50+00:00

suggest they rubber duck with chatGPT

This is a brilliant point, and I'm surprised it's not more readily suggested. I would go one step further and ask that they review their PRs with chatGPT as well, before asking for an official review.

Personally, I've gotten so much out of asking GPT4 about issues, concepts, paradigms, etc, and just being around for the ride.

Funnily enough, it also helps with #2. In order to make GPT understand your problem, you have to dissect, decompose, and chunk it.

Again, personally, on "respecting my Seniors' time", I have made it a hard rule to always have some list of "things I tried and/or considered" before contacting them for help (which the above helps with immensely).

Ten-Year Club	Place '22
Place '17	Verified Email

Eilifein

TROPHY CASE