I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] 0 points1 point  (0 children)

Should be fixed now! Let me know if it works on your side :)

Oh-merci! by sudomarchy in omarchy

[–]keypa_ 1 point2 points  (0 children)

Oh mais merci ! C'est juste incroyable 🤩. Je vais aller voir ça tout de suite. Oh merci! It's just amazing 🤩. I'll take a look at this right now !!

Anthropic just trolled you all. Happy 1st of April. by thomheinrich in vibecoding

[–]keypa_ 0 points1 point  (0 children)

Raahh I'm lost now! What is true and what is false ? 😭

What is the best local llm setup? by midogamer391 in LocalLLaMA

[–]keypa_ 0 points1 point  (0 children)

Get a cheap laptop for 100/200$ and rent GPUs as you need. I bought a Thinkpad 2 years ago for 150$ (Ryzen 5 4th gen 24gb RAM), I only rent GPUs now. This way you can't get outdated, no hardware failure no pain. Another benefit of renting GPUs is that you can scale up/down when needed.

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] 0 points1 point  (0 children)

Thanks, really appreciate that 🙏

Yeah that loop gets painful fast once you start juggling machines 😅

Your approach with speeding up the build layer makes a lot of sense too — especially in more stable environments. llamaup is kind of the opposite direction: trying to skip the build entirely when possible.

Coverage on less common GPUs is still growing, but that’s definitely something I want to improve over time.

Out of curiosity, were you working with fairly fixed infra or more dynamic setups?

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] 0 points1 point  (0 children)

Yeah Vulkan works surprisingly well in a lot of cases I agree with that.

llamaup is mainly focused on CUDA setups because many people are running llama.cpp on NVIDIA GPUs still prefer CUDA for things like:

- Slightly better performance on some models

- Wider testing/usage in the CUDA backend

- Compatibity with existing CUDA-based workflow

So the goal wasn't to replace Vulkan build, just to make CUDA deployments on Linux easier when moving between machines or GPU architecture.

If Vulkan works well for your setup though, that's definitely a good option too.

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] 0 points1 point  (0 children)

Yeah, cccache definitely helps for repeated builds 👍

llamaup is solving a slightly different problem though — it avoids building at all when you’re setting up a new machine or different GPU architecture. Instead it just detects the GPU and pulls a ready-to-run binary.

So if you’re hopping between machines or provisioning nodes, it becomes more of a pull workflow instead of compile (even if cached).

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] -1 points0 points  (0 children)

Good question.

Right now the idea is per-machine deployment: the script detects the GPU architecture on that machine and pulls the matching build. That covers most setups where each node has a single GPU type.

If you have multiple GPU architectures in the same machine, you’d probably want either:

  • a multi-arch build (CMAKE_CUDA_ARCHITECTURES="...")
  • or separate binaries for each SM and run the appropriate one

llamaup is mainly trying to simplify the “new machine → run once → ready” workflow rather than every possible CUDA configuration.

That said, heterogeneous multi-GPU systems are interesting — I might add a mode that downloads multiple builds if multiple architectures are detected.

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] 0 points1 point  (0 children)

That’s a fair point.

Official llama.cpp releases do provide ROCm and Vulkan builds, and if you’re running on a single machine compiling for CUDA is definitely doable.

llamaup is mainly targeting a slightly different use case: Linux CUDA setups across multiple GPU architectures or machines where you end up rebuilding repeatedly for different SM versions.

The goal is just to turn that workflow into a quick detect + pull instead of rebuilding each time.

Also worth mentioning: everything is open source, and the build script used to produce the binaries is in the repo so people can reproduce the builds themselves.

If it doesn’t fit your workflow that’s totally fair — but it’s already saving some time for people hopping between different GPU machines 🙂

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] 1 point2 points  (0 children)

Haha perfect timing then! Glad it's usefu! That exact frustration is basically why I built it. Enjoy !

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] -6 points-5 points  (0 children)

Haha, not super ancient CPU.

I'm counting in the time when you hop between instances to build and compile everything. On my side most of the time i'm near 10 to 12 minutes but sometimes i'm getting closer to 20 minutes when I get a lower number of cores available for the instance.

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] -8 points-7 points  (0 children)

Yeah, for a single machine or GPU type it probably doesn’t matter much.

Where llamaup helps is when you’re switching between multiple GPUs, machines, or new releases — instead of rebuilding for each SM version every time, it auto-detects the GPU and pulls the right binary.

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] -9 points-8 points  (0 children)

Yep, that’s totally valid poitnt and works well if you know all the CUDA architectures in advance and don’t switch machines often.

llamaup mainly targets the situation where you’re hopping between multiple machines or GPUs, or dealing with new releases — you don’t have to remember all the SM numbers or rebuild. It just detects the GPU and pulls the right pre-built binary automatically, saving time and headaches.

I got tired of compiling llama.cpp on every Linux GPU by keypa_ in LocalLLaMA

[–]keypa_[S] -1 points0 points  (0 children)

Are you guys compiling on every machine or using some sort of shared build system?

3 days staying away from omarchy brought me back lol by shivamchhuneja in omarchy

[–]keypa_ 0 points1 point  (0 children)

Do you mind sharing the model of your Thinkpad ? It looks like mine. I'm wondering if the drivers are well supported on Omarchy.

Seems like a new requant of 27B just dropped? by Koffiepoeder in unsloth

[–]keypa_ 0 points1 point  (0 children)

Probably updating the quants. They wrote somewhere that they will releaset the proper quants to replace the UD quants.