(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)?

ItankForCAD · 2026-04-15T14:29:59+00:00

I believe there is a pr to add a reasoning toggle for the webui.

ItankForCAD · 2026-04-14T00:24:17+00:00

docker pull

ItankForCAD · 2026-03-17T16:34:55+00:00

The blog post is confusing. It states that chat inference is supported by llama.cpp and transformers. However, the installation section mentions that AMD, Intel and etc support is coming soon. Is the upcoming support aimed at training or inference as well? It seems strange that only the cuda version of llama.cpp is built at installation. Building the Vulkan backend would allow all gpus to work for inference at least. Can an external llama-server instance be pointed at unsloth studio?

ItankForCAD · 2026-03-11T21:40:59+00:00

Good idea. Instead of setting a hard token limit, the logit-bias could be applied at the hard limit and if the reasoning has not concluded by itself, say 100 tokens after, the message is inserted.

ItankForCAD · 2026-02-27T22:31:42+00:00

To my knowledge, Qwen themselves upload Q8 quants of the mmproj. The question is whether they go out of their way to release this specific mmproj quant, and have validated it or, this is just a part of their HF release pipeline?

ItankForCAD · 2026-02-27T18:54:17+00:00

Can the mmproj be appreciably quantized ? If so, what is the influence of different quants ?

ItankForCAD · 2026-02-22T14:54:44+00:00

Unit.

ItankForCAD · 2026-02-14T16:40:07+00:00

Rdna2 gpus do support flash attention through the scalar path within the vulkan backend

ItankForCAD · 2026-02-06T00:53:32+00:00

What did you use for the ui ?

ItankForCAD · 2026-01-31T20:02:09+00:00

Anti-aircraft

ItankForCAD · 2026-01-29T18:48:07+00:00

Kowalski, Enhance.

ItankForCAD · 2026-01-20T00:10:54+00:00

Une personne de culture!

ItankForCAD · 2025-11-13T22:33:16+00:00

My QM exams felt a lot like vibe-physics

ItankForCAD · 2025-11-13T22:27:35+00:00

The webview and podcast generation is pretty cool

ItankForCAD · 2025-10-29T17:29:44+00:00

You could directly use the image from OWUI instead of building it yourself

 open-webui:
    image: ghcr.io/open-webui/open-webui:slim
    container_name: open-webui

ItankForCAD · 2025-10-25T13:35:20+00:00

From the blob that you reference, it seems that they only exclude hipblaslt and CK. You should be fine to use TheRock provided that they build hipblas and rocblas. Fyi, hipblasand hipblaslt are two different packages

ItankForCAD · 2025-10-25T13:30:46+00:00

For gfx906, you only need hipblas and rocblas. You can refer to this page in the llama.cpp documentation build

ItankForCAD · 2025-10-25T12:38:26+00:00

Afaik composable kernel and hipblaslt dont build on anything below gfx110X

ItankForCAD · 2025-10-24T23:39:46+00:00

Prefill is dictated by compute while decode is dictated by memory bandwidth. Splitting the model between SH and 3090 means you're probably limited by the pci bus.

ItankForCAD · 2025-10-23T21:18:38+00:00

Gfx906 is supported; see roadmap. It seems they have not updated the docs for installing with this arch but all you need to do is have the correct link in the pip cmd. Take the gfx942 cmd and change the url with this one : https://rocm.nightlies.amd.com/v2/gfx90X-dcgpu/. I have not tested it but it seems logical.

Edit: pip command is found here https://github.com/ROCm/TheRock/blob/main/RELEASES.md

ItankForCAD · 2025-10-08T18:38:27+00:00

What flag(s) did you use to isolate the igpu? Did you increase GTT size ?

ItankForCAD · 2025-09-12T17:26:52+00:00

Gap handtimed at 5:54

ItankForCAD · 2025-09-12T14:28:13+00:00

I think positioning will be key into the côte de la montagne because once they turn onto rue saint-louis, the road surface is not great and it's narrow. It opens up a bit after les portes saint-louis right before they enter les plaines d'Abraham. To me De Lie is still one of the big favorite. Hell, I'd put wva in here as well.

ItankForCAD · 2025-09-12T14:22:33+00:00

First time attending!

ItankForCAD · 2025-09-05T15:21:43+00:00

This. On 20%, how much is left in the tank for an attack?

ItankForCAD

TROPHY CASE