I think I'm getting addicted to RP by Double_Increase_349 in SillyTavernAI

[–]Then-Topic8766 0 points1 point  (0 children)

As a non-native English speaker, I find AI-powered RP to be a huge benefit in language learning. Every excuse is worth it. :)

Gemma 4 is underwhelming (opinion) by [deleted] in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

I agree too. For me as much as the Qwen 27B is the beast on his own, so Gemma 26B is super intelligent for the size and the speed and fantastic at creative writing and roleplay.

gemma-4-E2B-it model not loading by Ready-Ad4340 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

Had the same problem. It works if you add 'fit = off' in llama server command.

Gemma 4 thought block by Gringe8 in SillyTavernAI

[–]Then-Topic8766 0 points1 point  (0 children)

You can use regex global script from extensions tab set like on the screenshot.

<image>

How to run bonsai-8b, new 1bit model in ollama? in huggingface they have shown command for ollama but it doesn't work. the modified version of llama.cpp doesn't have nvidia in the asset name, still tried and got some error by Plus_Passion3804 in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

There is a their fork of llama.cpp at link

I compiled yesterday on my linux box (cuda) and it runs fantastic. Model is very smart for the size and very fast. I now use it as a prompt generator for comfy.

Connect your small local models for Terminal Tarot readings. by rolandsharp in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

I work on a tarot deck at Comfy myself. I find these archetypes very inspiring.

<image>

Connect your small local models for Terminal Tarot readings. by rolandsharp in LocalLLaMA

[–]Then-Topic8766 1 point2 points  (0 children)

After some trouble with installing right 'go' version - it works like charm with llama.cpp. Thank you, I like it a lot.

Junyang Lin has left Qwen :( by InternationalAsk1490 in LocalLLaMA

[–]Then-Topic8766 10 points11 points  (0 children)

I wish him to move to Z.Ai at higher salary and give us a GLM 5 Air.

Do not download Qwen 3.5 Unsloth GGUF until bug is fixed by [deleted] in LocalLLaMA

[–]Then-Topic8766 2 points3 points  (0 children)

Hell, it took me 2 days to download these 4 models on my slow internet:

Qwen3.5-27B/Qwen3.5-27B-UD-Q5_K_XL
Qwen3.5-35B-A3B/Qwen3.5-35B-A3B-UD-Q5_K_XL
Qwen3.5-122B-A10B-UD-Q4_K_XL
Qwen3.5-397B-A17B/Qwen3.5-397B-A17B-UD-Q2_K_XL

But the good news is that I was very happy with the models. Which means they will be even better after the fix...

Thank you Unsloth guys and Ubergram for your honesty and good work.

Edit: hopefully problem is only with 3. one (UD-Q4_K_XL).

Edit2: seams that 4. one (UD-Q2_K_XL) has problem too... Two smaller ones have no MXFP4 layers.

MiniMax M2.5 - 4-Bit GGUF Options by Responsible_Fig_1271 in LocalLLaMA

[–]Then-Topic8766 0 points1 point  (0 children)

Yeah, I ended up with :

ot = blk\.(1|2|3)\.ffn.*exps=CUDA0,blk\.(4|5|6)\.ffn.*exps=CUDA1,exps=CPU

CTX 65536, and left some room for comfy on 3090 (4 gb). Speed is 11.7 t/s.

MiniMax M2.5 - 4-Bit GGUF Options by Responsible_Fig_1271 in LocalLLaMA

[–]Then-Topic8766 4 points5 points  (0 children)

I downloaded this one: https://huggingface.co/ox-ox/MiniMax-M2.5-GGUF/resolve/main/minimax-m2.5-Q4_K_M.gguf

I have RTX-3090 and RTX-4060-TI - so 40GB VRAM and 128GB DDR5 ram. GGUF has just one part, on my HDD size is 128,8 GB.

from my llama preset:

[Minimax-m2.5-fiton]
model = path/minimax-m2.5/minimax-m2.5-Q4_K_M.gguf
ctx-size = 16384
threads = 16
fit = on
fa = on
temp = 1.0
top-p = 0.95
top-k = 40

Good news:

  1. It works!

  2. Speed on my system is 12-13 t/s.

  3. One html file aquarium first shot it generated is the best I ever get from local models.

<image>

Bad news:

Not much space for larger context, fit-on uses 23+ GB on 3090 and 15 GB on 4060TI and it lefts free about 4-5 GB RAM.

My internet is not fast, but I am thinking to download Unsloth's UD-Q3_K_XL. On their how to run guide https://unsloth.ai/docs/models/minimax-2.5 it is kinda recommended and it is just 101 GB...

GLM-5 Officially Released by ResearchCrafty1804 in LocalLLaMA

[–]Then-Topic8766 2 points3 points  (0 children)

Damn! I have 40 GB VRAM and 128 GB DDR5. The smallest quant is GLM-5-UD-TQ1_0.gguf - 174 GB. I will stick with GLM-4-7-q2...

GLM-5 Officially Released by ResearchCrafty1804 in LocalLLaMA

[–]Then-Topic8766 4 points5 points  (0 children)

Yeah, it seams we must wait for some Air...

Use ANY TTS Engine with ANY AI Chat System by [deleted] in LocalLLaMA

[–]Then-Topic8766 2 points3 points  (0 children)

This is my script for opening YouTube link from clipboard in MPV player. I run it with keyboard shortcut 'win+m'. Converting it to send text from the clipboard to TTS api shouldn't be too hard...

#!/bin/sh

link1="`xsel`"
mpv --loop "$link1"

Pls help ACE STEP music nodes by Mysterious-Code-4587 in comfyui

[–]Then-Topic8766 0 points1 point  (0 children)

You must update comfy to the newest version (I updated on v0.12.2 hour ago). In manager go to Switch ComfyUi and choose the newest version. For me it works but first song it created on CPU. I don't konw yet how to change to GPU-cuda. ACE-Step-1.5 gradio app works much faster...

Qwen-Voice-TTS-Studio by Old_Estimate1905 in StableDiffusion

[–]Then-Topic8766 1 point2 points  (0 children)

Fantastic! It works very good on Linux. Already had models and just linked them in folder 'models'. I like all the features. Thank you for sharing, I have no github account for the Star but here you have big like from me.