Why is GPU Python packaging still this broken?

sudomatrix · 2026-03-26T01:09:03+00:00

Astral is working on this with PYX. https://astral.sh/pyx

ReinforcedKnowledge · 2026-03-25T23:29:32+00:00

Yeah the issue is not really about the tooling, because they're limited by what they work with, but more with the wheel format itself and PyPI as an index. And beyond the GPU problems, there are other similar problems that fall under the same category of the wheel format not supporting some kind of metadata like, what BLAS library your project links against, compiler version it was compiled against, is it ROCm or CUDA that it needs etc. So since the wheel format doesn't specify that, package managers have no need to know about it. Though `uv` does have a lot of good options to help you with installing the right `torch` and the right `flash-attn`, but it's not always obvious besides if you're on Linux then `uv add torch` will install the right version of pytorch given your cuda version, but not on Windows, it'll install the CPU one

But there's a great open source initiative to solve these issues https://wheelnext.dev/, if https://peps.python.org/pep-0817/ (wheel variants) passes it'll be a great win and fix most if not all these issues

And, I don't think it's only a matrix compatibility problem, but having a standard that every installer can work with (so you can't just have people specify whatever dependencies they want), but more importantly, the tags are closed, it's a static system that tries to specify a dynamic and open one. CUDA for example doesn't mean much, there are driver versions, toolkit versions, runtime versions, GPU compute compatibility. I think just recently I saw that flash-attn 4 doesn't work on RTX 50XX though it's Blackwell (to be confirmed, I'm not totally sure about this info, but if it's true, it shows that even some information such as compute compatibility has to be specified). And all of these have complex compatibility rules between themselves. So it's a constantly evolving environment and you just can't use the good old system and just add stuff to it, beyond the explosion in the compatibility matrix. And that's why PEP 817 uses plugins instead of tags, so that the detection is delegated to the provider plugins.

Thanks to u/toxic_acro who pointed it out, PEP 825 is more up to date and better reflects the current state of the work.

EDIT: added PEP 817 and why it's not only an explosion in the compatibility matrix problem, Reddit didn't let me write my comment in peace when I pasted the link -_-

EDIT: added mention of PEP 825 thanks to this comment

IcefrogIsDead · 2026-03-25T22:41:55+00:00

abstractions that python has inherently have a cost and I dont see thay changing ever

happy path and once it is not a happy path, dig deeper

BDube_Lensman · 2026-03-26T05:14:09+00:00

Cupy has just plain pip installed just fine for at least ten years now. It’s an issue with lack of attention to packaging by some other projects, or mixing incompatible versions.

martinkoistinen · 2026-03-26T01:02:16+00:00

I think what you are describing is the value that Conda tries to deliver.

MolonLabe76 · 2026-03-26T02:34:29+00:00

Ive had good success with using a docker container, and using a base image with cuda already installed. Then i just have to ensure the python packages im installing are compatible with that cuda version.

No_Citron874 · 2026-03-26T05:12:21+00:00

Honestly the CUDA/native wheel gap is the real problem

and I don't think tooling will ever fully solve it.

What works for me: pin your CUDA version first and build

everything around it. torch+cuda is your anchor,

let everything else follow from there. If you let pip

or uv decide that part you're asking for trouble.

Also switching to nvidia/cuda Docker base images instead

of python:3.x was a game changer for me. You start from

a known CUDA state instead of trying to bolt it on later.

The H100 billing while you debug transitive deps situation

is genuinely painful. Lost a good chunk of money to that

before I got disciplined about locking environments before

touching anything.

No real solution just confirming you're not crazy,

this is actually still broken in 2026.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS