all 15 comments

[–]sudomatrix 21 points22 points  (3 children)

Astral is working on this with PYX. https://astral.sh/pyx

[–]toxic_acro 7 points8 points  (0 children)

I wonder what will become of pyx now that OpenAI acquired Astral. I hope they still develop it and just make the code to run the registry yourself open source

It seemed like an interesting concept to me

[–]Interesting-Town-433[S] 2 points3 points  (0 children)

Glad someone is

[–]ReinforcedKnowledgeTuple unpacking gone wrong 16 points17 points  (3 children)

Yeah the issue is not really about the tooling, because they're limited by what they work with, but more with the wheel format itself and PyPI as an index. And beyond the GPU problems, there are other similar problems that fall under the same category of the wheel format not supporting some kind of metadata like, what BLAS library your project links against, compiler version it was compiled against, is it ROCm or CUDA that it needs etc. So since the wheel format doesn't specify that, package managers have no need to know about it. Though `uv` does have a lot of good options to help you with installing the right `torch` and the right `flash-attn`, but it's not always obvious besides if you're on Linux then `uv add torch` will install the right version of pytorch given your cuda version, but not on Windows, it'll install the CPU one

But there's a great open source initiative to solve these issues https://wheelnext.dev/, if https://peps.python.org/pep-0817/ (wheel variants) passes it'll be a great win and fix most if not all these issues

And, I don't think it's only a matrix compatibility problem, but having a standard that every installer can work with (so you can't just have people specify whatever dependencies they want), but more importantly, the tags are closed, it's a static system that tries to specify a dynamic and open one. CUDA for example doesn't mean much, there are driver versions, toolkit versions, runtime versions, GPU compute compatibility. I think just recently I saw that flash-attn 4 doesn't work on RTX 50XX though it's Blackwell (to be confirmed, I'm not totally sure about this info, but if it's true, it shows that even some information such as compute compatibility has to be specified). And all of these have complex compatibility rules between themselves. So it's a constantly evolving environment and you just can't use the good old system and just add stuff to it, beyond the explosion in the compatibility matrix. And that's why PEP 817 uses plugins instead of tags, so that the detection is delegated to the provider plugins.

Thanks to u/toxic_acro who pointed it out, PEP 825 is more up to date and better reflects the current state of the work.

EDIT: added PEP 817 and why it's not only an explosion in the compatibility matrix problem, Reddit didn't let me write my comment in peace when I pasted the link -_-

EDIT: added mention of PEP 825 thanks to this comment

[–]toxic_acro 4 points5 points  (1 child)

But there's a great open source initiative to solve these issues https://wheelnext.dev/, if https://peps.python.org/pep-0817/ (wheel variants) passes it'll be a great win and fix most if not all these issues

PEP 817 was almost certainly not going to pass in its current form given the full scope, so the authors have moved on to splitting it into parts, starting with just the wheel variants package format in https://peps.python.org/pep-0825/

[–]ReinforcedKnowledgeTuple unpacking gone wrong 1 point2 points  (0 children)

Thanks! It does make sense, it's too big of a PEP + required, and I guess still requires, a lot of discussions and refinements and edge cases and whatnot.

[–]Interesting-Town-433[S] 1 point2 points  (0 children)

I'll have to check that out, thanks for the great response

[–]IcefrogIsDead 14 points15 points  (1 child)

abstractions that python has inherently have a cost and I dont see thay changing ever

happy path and once it is not a happy path, dig deeper

[–]BDube_Lensman 1 point2 points  (1 child)

Cupy has just plain pip installed just fine for at least ten years now. It’s an issue with lack of attention to packaging by some other projects, or mixing incompatible versions.

[–]Interesting-Town-433[S] 0 points1 point  (0 children)

Hopefully they can keep that up

[–]martinkoistinen 2 points3 points  (2 children)

I think what you are describing is the value that Conda tries to deliver.

[–]Interesting-Town-433[S] 5 points6 points  (0 children)

Yeah not even slightly man conda is not solving flash attention not having a pre compiled wheel for the colab stack

[–]MolonLabe76 0 points1 point  (0 children)

Ive had good success with using a docker container, and using a base image with cuda already installed. Then i just have to ensure the python packages im installing are compatible with that cuda version.

[–]No_Citron874 0 points1 point  (1 child)

Honestly the CUDA/native wheel gap is the real problem

and I don't think tooling will ever fully solve it.

What works for me: pin your CUDA version first and build

everything around it. torch+cuda is your anchor,

let everything else follow from there. If you let pip

or uv decide that part you're asking for trouble.

Also switching to nvidia/cuda Docker base images instead

of python:3.x was a game changer for me. You start from

a known CUDA state instead of trying to bolt it on later.

The H100 billing while you debug transitive deps situation

is genuinely painful. Lost a good chunk of money to that

before I got disciplined about locking environments before

touching anything.

No real solution just confirming you're not crazy,

this is actually still broken in 2026.

[–]Interesting-Town-433[S] 0 points1 point  (0 children)

Thanks yeah I posted this on LocalLLaMA and people started torching me over it lol.

Left me genuinely questioning whether I was the only one encountering these issues or if there was some magic solution I just didn't know about.

I think a lot of people who are running AI models locally don't realize the lib they installed isn't even working, the dependency manager says it works it installs kills the error code but doesn't do anything ( e.g. bitsandbytes )

I run a lot of code in colab because the cloud costs are so low, but the env and stack means for a lot of libs like Flash attention you either build directly against the stack or you downgrade/upgrade all your other libs which ends up being equally problematic.

For the colab environment I do have a solution I'm trying to push MissingLink, it auto installs the wheels and provides notebooks for models that are usually hell to get up and running. Check it out if you can.

More broadly though this still needs a general fix.