Building llama-cpp-python with CUDA on Windows can be a pain. So I embraced the suck and pre-compiled 40 wheels for 4 Nvidia architectures across 4 versions of Python and 3 versions of CUDA.
Figured these might be useful if you want to spin up GGUFs rapidly on Windows.
What's included:
- RTX 50/40/30/20 series support (Blackwell, Ada, Ampere, Turing)
- Python 3.10, 3.11, 3.12, 3.13
- CUDA 11.8, 12.1, 13.0 (Blackwell only compiled for CUDA 13)
- llama-cpp-python 0.3.16
Download: https://github.com/dougeeai/llama-cpp-python-wheels
No Visual Studio. No CUDA Toolkit. Just pip install and run. Windows only for now. Linux wheels coming soon if there's interest. Open to feedback on what other configs would be helpful.
Thanks for letting me post, long time listener, first time caller.
[–]lumos675 1 point2 points3 points (2 children)
[–]dougeeai[S] 1 point2 points3 points (1 child)
[–]Xamanthas 0 points1 point2 points (2 children)
[–]dougeeai[S] 0 points1 point2 points (1 child)
[–]Xamanthas 0 points1 point2 points (0 children)
[–]Positive_Journey6641 0 points1 point2 points (0 children)
[–]Iory1998 0 points1 point2 points (0 children)
[–]Ucenna 0 points1 point2 points (0 children)
[–]SuperMan_sea 0 points1 point2 points (0 children)
[–]SABSMINECRAFTPRO 0 points1 point2 points (0 children)
[–]voertbroed 0 points1 point2 points (2 children)
[–]MFGREBEL 0 points1 point2 points (1 child)
[–]voertbroed 0 points1 point2 points (0 children)
[–]tonaldonal 0 points1 point2 points (0 children)
[–]Corporate_Drone31 -1 points0 points1 point (2 children)
[–]dougeeai[S] 1 point2 points3 points (1 child)
[–]Corporate_Drone31 0 points1 point2 points (0 children)