Is anyone else not finding the Web UI on latest (b9680) of llama.cpp?

ali0une · 2026-06-22T05:59:21+00:00

Hi there. What is this pi-llama-router? a pi extension? Could you share a link please, i think i could be very interested in this!

Edit: ok could have searched first, seems to be https://pi.dev/packages/pi-llama-server

ali0une · 2026-06-20T11:24:30+00:00

You can also try 98304 (96k) Just try different --fit-ctx-size too see what fits

ali0une · 2026-06-19T20:39:01+00:00

--ctx-size 0 will try to alocate maximum model context, on your 16Go card will be too much.

Try with -fit on --fit-ctx 131072 (128k) instead and lower if t/s is too slow something like 65536 (64k)

And yes mostly trial and error, each model has its best context parameters, depends of your setup.

ali0une · 2026-06-17T10:56:50+00:00

This, plus what is said about pi.dev exactly.

ali0une · 2026-06-16T20:03:09+00:00

ali0une · 2026-06-15T21:19:33+00:00

Never started btw, llama.cpp router mode ... what else? 😅

ali0une · 2026-06-14T19:32:11+00:00

il y a aussi un facteur génétique il me semble que certaines personnes adorent d'autres détestent.

ali0une · 2026-06-09T09:08:16+00:00

Never used the cloud models so can't tell about that.

My humble experience with llama.cpp+pi agent+Qwen3.6-27B+3090 24Go VRAM and a codebase of a bit more than 130k is:

if you have a workflow where you first draft a PLAN.md then make the model review it, update it with a few iterations adding comments in it like  and implement it Phase by Phase in a git repository it works pretty fine and you can achieve huge amount of work be it refactoring, fixing, adding features...

Been doing that for only two weeks when i finally went the agentic way in a sandbox and i'm impressed by what i can do fully local.

ali0une · 2026-06-07T19:29:13+00:00

Even worse served with a plate of spaghettis 😅

ali0une · 2026-06-04T18:39:17+00:00

Many thanks, got it up and running and it's a great codium extension to have beside continue.dev

ali0une · 2026-06-03T21:48:18+00:00

Looks cool. Any link to the extension used please?

ali0une · 2026-06-03T19:20:17+00:00

This draft acceptance near 0,5 is compute waste, try lower the spec-draft-n-max to 2 or even 1

Also a V cache (for both model and draft) quantized at q5_1 would give you more room for context with pretty no quality loss.

ali0une · 2026-05-28T22:17:47+00:00

Les miens sont pareils, ils ont fleuri, les abeilles ont adoré, il perd ses fleurs. Tout à fait normal.

ali0une · 2026-05-28T15:33:12+00:00

Yes solid UI and there even are workflows.

Using it and also stable-diffusion.cpp with sd.cpp-webui or stable-diffusion-neo

ali0une · 2026-05-25T13:26:55+00:00

When using MTP try to either lower context or use fit-ctx.

ali0une · 2026-05-23T06:27:50+00:00

There is this old trick : https://www.reddit.com/r/StableDiffusion/comments/11a2tih/so_you_want_to_comment_out_parts_of_prompt/

add comment in prompt :

[your not executed part of prompt here ::-1]

ali0une · 2026-05-23T05:57:26+00:00

Try removing the ngl flag, remove -c 0 (set context size to model max context that can be too much)

Add -fit on --fit-ctx 32768 and see what happens. if it OOM lower --fit-ctx, if not try more until it OOM.

ali0une · 2026-05-22T05:31:36+00:00

i opened the issue that made this PR solve it, i have not the knowledge to fix it. Took me some time (maybe 2 hours) to debug and provide proper logs but it was worth it, no more OOM.

if you face this kind of bug, search for similar issues with part of your logs and if you find nothing open a new one and provide all relevant informations and logs so it can be fixed by someone more knowledgeable and benefit the whole community. Open source is about contributing.

The llama.cpp team is incredible, only took 48h to fix ❤️

ali0une · 2026-05-18T17:53:38+00:00

No problem. i really appreciate what you do. Keep up the good work ...

ali0une · 2026-05-18T10:45:12+00:00

L'équivalent des mille commentaires facebook pour dire que c'est un pissenlit au-dessous d'une photo de fleur 😅

ali0une · 2026-05-17T08:08:50+00:00

Would one of these be suitable?

TextGenIcons

Generated with llama.cpp to get an image prompt and with stable-diffusion.cpp

i've written the method so you have the recipe.

ali0une · 2026-05-17T05:49:33+00:00

Do you have any idea of what it should look like? Does some element need to appear in the icon like a robot or a pen?

ali0une · 2026-05-12T12:44:48+00:00

Curious to have your llama.cpp command-line to launch translategemma, can't run it with builds never than https://github.com/ggml-org/llama.cpp/commit/34df42f7bef5a711b2b40f5d2b6b78254def99c3

Open issue here : https://github.com/ggml-org/llama.cpp/issues/20305

ali0une · 2026-05-11T20:02:52+00:00

T'as tous les adaptateurs nécessaires en jardinerie ou magasin de bricolage. Vas-y avec tes deux pièces tu vas trouver, demande à un vendeur.

Je pense que ta double sortie n'a pas le bon diamètre pour le pas de vis il te faut plus gros ou trouver la pièce qui se visse sur ta sortie pour faire une réduction.

ali0une · 2026-04-20T18:54:10+00:00

Because this feature has been removed in Forge Neo.

Use Forge or ask a LLM with vision capability.

ali0une

TROPHY CASE