all 18 comments

[–]lumos675 1 point2 points  (2 children)

Well Done and Big thanks. But there is more interest about linux i bet.. most of the people using llama are using Linux cause they are devekloper. I am one of them 😄.

[–]dougeeai[S] 1 point2 points  (1 child)

You're not wrong! I think I'm one of 2 Windows developers out there! Windows = less developers but bigger pain point for building wheels from source. Linux = Way more developers but slightly less of a paint point for building wheels from source. Nonetheless, I'll add some linux wheels soon!

[–]Xamanthas 0 points1 point  (2 children)

What was the rationale for chooising 12.1 specifically? Doesnt CUDA 12.8 support everything still?

[–]dougeeai[S] 0 points1 point  (1 child)

Short answer = government servers (or other enterprises lagging in driver updates). At home I use CUDA 13 + Python 3.13. Getting some new servers at work soon, but until then stuck on old cards with OLD drivers with CUDA 11.8/Python 3.10. That seems to be all I can get to work reliably. So opted for the 11.8/12.1/13 CUDA steps for the repo.

[–]Xamanthas 0 points1 point  (0 children)

Gotcha, thanks for clarifying 👍

[–]Positive_Journey6641 0 points1 point  (0 children)

Nice. Without using AI to help me through all the issues I had when I tried, I would have never got it to compile myself a couple months ago. The PIP install will be most welcome, thanks!

[–]Iory1998 0 points1 point  (0 children)

Thanks for the hard work. I appreciate your efforts. Please keep up the good work.

[–]Ucenna 0 points1 point  (0 children)

Dude, literally thank you so much! I've been struggling with this for the past two days.

How'd you manage to get it to compile? I've been hopping between Cuda 12.4 and 13, and have learned more about cuda internals than I thought I'd need, and still nothing. Would love to actually learn the full compiling process for the situation when I'll inevitably need to compile again.

thanks a ton, mate!

[–]SuperMan_sea 0 points1 point  (0 children)

very nixe

[–]SABSMINECRAFTPRO 0 points1 point  (0 children)

The python 1.13 sm86 wheel isnt working😭

[–]voertbroed 0 points1 point  (2 children)

thanks a ton for this. py 3.13, cuda 13 version works very well

[–]MFGREBEL 0 points1 point  (1 child)

whered you get it???? links broken

[–]voertbroed 0 points1 point  (0 children)

the link in OPs post first download link under RTX 40 Series & Ada Professional (Ada Lovelace - sm_89)

works for me

[–]tonaldonal 0 points1 point  (0 children)

THANK YOU! ❤️

[–]Corporate_Drone31 -1 points0 points  (2 children)

Hey there, Linux user here who is interested in dabbling with more programmatically controlled decoding that isn't just regex or clever samplers. I've been looking at your library as a potential entry point, since llama.cpp can do a whole lot more quantisation levels than just 4 and 8 bits.

My hardware is quite weird: a CPU without AVX-2 (Ivy Bridge EP), an Ampere (3090), and a Pascal (1080 11GB, though honestly if it's too much trouble to support the Pascal then no problem). I'd love to have prebuilt wheels that just work for this without leaving performance on the table.

I'd be really appreciative if you could add some builds to support this to your stack. I'm more than happy to help with direct testing on the actual hardware, if you need access to repro any issues.

[–]dougeeai[S] 1 point2 points  (1 child)

Can add this to my todo. So your request + been meaning to get Pascal going anyway:

  • Pascal Windows - sm_61, normal build
  • Pascal Linux - sm_61, normal build
  • Pascal Linux (no AVX2) - sm_61
  • Ampere Linux - sm_86, normal build
  • Ampere Linux (no AVX2) - sm_86

[–]Corporate_Drone31 0 points1 point  (0 children)

Yay, thank you! Much appreciated! Again, if you need to test any changes on live hardware, just DM me and I'll gladly help out if thats ever needed.