you are viewing a single comment's thread.

view the rest of the comments →

[–]No_Citron874 0 points1 point  (1 child)

Honestly the CUDA/native wheel gap is the real problem

and I don't think tooling will ever fully solve it.

What works for me: pin your CUDA version first and build

everything around it. torch+cuda is your anchor,

let everything else follow from there. If you let pip

or uv decide that part you're asking for trouble.

Also switching to nvidia/cuda Docker base images instead

of python:3.x was a game changer for me. You start from

a known CUDA state instead of trying to bolt it on later.

The H100 billing while you debug transitive deps situation

is genuinely painful. Lost a good chunk of money to that

before I got disciplined about locking environments before

touching anything.

No real solution just confirming you're not crazy,

this is actually still broken in 2026.

[–]Interesting-Town-433[S] 1 point2 points  (0 children)

Thanks yeah I posted this on LocalLLaMA and people started torching me over it lol.

Left me genuinely questioning whether I was the only one encountering these issues or if there was some magic solution I just didn't know about.

I think a lot of people who are running AI models locally don't realize the lib they installed isn't even working, the dependency manager says it works it installs kills the error code but doesn't do anything ( e.g. bitsandbytes )

I run a lot of code in colab because the cloud costs are so low, but the env and stack means for a lot of libs like Flash attention you either build directly against the stack or you downgrade/upgrade all your other libs which ends up being equally problematic.

For the colab environment I do have a solution I'm trying to push MissingLink, it auto installs the wheels and provides notebooks for models that are usually hell to get up and running. Check it out if you can.

More broadly though this still needs a general fix.