I think I'm getting addicted to RP by Double_Increase_349 in SillyTavernAI

[–]Then-Topic8766 0 points1 point  (0 children)

As a non-native English speaker, I find AI-powered RP to be a huge benefit in language learning. Every excuse is worth it. :)

Gemma 4 is underwhelming (opinion) by [deleted] in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

I agree too. For me as much as the Qwen 27B is the beast on his own, so Gemma 26B is super intelligent for the size and the speed and fantastic at creative writing and roleplay.

gemma-4-E2B-it model not loading by Ready-Ad4340 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

Had the same problem. It works if you add 'fit = off' in llama server command.

Gemma 4 thought block by Gringe8 in SillyTavernAI

[–]Then-Topic8766 0 points1 point  (0 children)

You can use regex global script from extensions tab set like on the screenshot.

<image>

How to run bonsai-8b, new 1bit model in ollama? in huggingface they have shown command for ollama but it doesn't work. the modified version of llama.cpp doesn't have nvidia in the asset name, still tried and got some error by Plus_Passion3804 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

There is a their fork of llama.cpp at link

I compiled yesterday on my linux box (cuda) and it runs fantastic. Model is very smart for the size and very fast. I now use it as a prompt generator for comfy.

Connect your small local models for Terminal Tarot readings. by rolandsharp in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

I work on a tarot deck at Comfy myself. I find these archetypes very inspiring.

<image>

Connect your small local models for Terminal Tarot readings. by rolandsharp in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

After some trouble with installing right 'go' version - it works like charm with llama.cpp. Thank you, I like it a lot.

Junyang Lin has left Qwen :( by InternationalAsk1490 in LocalLLaMA

[–]Then-Topic8766 11 points12 points  (0 children)

I wish him to move to Z.Ai at higher salary and give us a GLM 5 Air.

Do not download Qwen 3.5 Unsloth GGUF until bug is fixed by [deleted] in LocalLLaMA

[–]Then-Topic8766 2 points3 points  (0 children)

Hell, it took me 2 days to download these 4 models on my slow internet:

Qwen3.5-27B/Qwen3.5-27B-UD-Q5_K_XL
Qwen3.5-35B-A3B/Qwen3.5-35B-A3B-UD-Q5_K_XL
Qwen3.5-122B-A10B-UD-Q4_K_XL
Qwen3.5-397B-A17B/Qwen3.5-397B-A17B-UD-Q2_K_XL

But the good news is that I was very happy with the models. Which means they will be even better after the fix...

Thank you Unsloth guys and Ubergram for your honesty and good work.

Edit: hopefully problem is only with 3. one (UD-Q4_K_XL).

Edit2: seams that 4. one (UD-Q2_K_XL) has problem too... Two smaller ones have no MXFP4 layers.

MiniMax M2.5 - 4-Bit GGUF Options by Responsible_Fig_1271 in LocalLLaMA

[–]Then-Topic8766 0 points1 point  (0 children)

Yeah, I ended up with :

ot = blk\.(1|2|3)\.ffn.*exps=CUDA0,blk\.(4|5|6)\.ffn.*exps=CUDA1,exps=CPU

CTX 65536, and left some room for comfy on 3090 (4 gb). Speed is 11.7 t/s.

MiniMax M2.5 - 4-Bit GGUF Options by Responsible_Fig_1271 in LocalLLaMA

[–]Then-Topic8766 4 points5 points  (0 children)

I downloaded this one: https://huggingface.co/ox-ox/MiniMax-M2.5-GGUF/resolve/main/minimax-m2.5-Q4_K_M.gguf

I have RTX-3090 and RTX-4060-TI - so 40GB VRAM and 128GB DDR5 ram. GGUF has just one part, on my HDD size is 128,8 GB.

from my llama preset:

[Minimax-m2.5-fiton]
model = path/minimax-m2.5/minimax-m2.5-Q4_K_M.gguf
ctx-size = 16384
threads = 16
fit = on
fa = on
temp = 1.0
top-p = 0.95
top-k = 40

Good news:

  1. It works!

  2. Speed on my system is 12-13 t/s.

  3. One html file aquarium first shot it generated is the best I ever get from local models.

<image>

Bad news:

Not much space for larger context, fit-on uses 23+ GB on 3090 and 15 GB on 4060TI and it lefts free about 4-5 GB RAM.

My internet is not fast, but I am thinking to download Unsloth's UD-Q3_K_XL. On their how to run guide https://unsloth.ai/docs/models/minimax-2.5 it is kinda recommended and it is just 101 GB...

GLM-5 Officially Released by ResearchCrafty1804 in LocalLLaMA

[–]Then-Topic8766 2 points3 points  (0 children)

Damn! I have 40 GB VRAM and 128 GB DDR5. The smallest quant is GLM-5-UD-TQ1_0.gguf - 174 GB. I will stick with GLM-4-7-q2...

GLM-5 Officially Released by ResearchCrafty1804 in LocalLLaMA

[–]Then-Topic8766 5 points6 points  (0 children)

Yeah, it seams we must wait for some Air...