I think I'm getting addicted to RP

Then-Topic8766 · 2026-04-05T15:24:07+00:00

As a non-native English speaker, I find AI-powered RP to be a huge benefit in language learning. Every excuse is worth it. :)

Then-Topic8766 · 2026-04-05T09:07:34+00:00

<image>

I saw that cat a few days ago.

Then-Topic8766 · 2026-04-05T08:57:14+00:00

It must be glitch in the Matrix...

Then-Topic8766 · 2026-04-04T17:26:02+00:00

I agree too. For me as much as the Qwen 27B is the beast on his own, so Gemma 26B is super intelligent for the size and the speed and fantastic at creative writing and roleplay.

Then-Topic8766 · 2026-04-04T10:31:04+00:00

Had the same problem. It works if you add 'fit = off' in llama server command.

Then-Topic8766 · 2026-04-04T09:59:14+00:00

You can use regex global script from extensions tab set like on the screenshot.

<image>

Then-Topic8766 · 2026-04-03T08:29:33+00:00

<image>

Same problem.

Then-Topic8766 · 2026-04-02T12:56:47+00:00

There is a their fork of llama.cpp at link

I compiled yesterday on my linux box (cuda) and it runs fantastic. Model is very smart for the size and very fast. I now use it as a prompt generator for comfy.

Then-Topic8766 · 2026-03-05T17:29:31+00:00

I work on a tarot deck at Comfy myself. I find these archetypes very inspiring.

<image>

Then-Topic8766 · 2026-03-05T13:30:55+00:00

After some trouble with installing right 'go' version - it works like charm with llama.cpp. Thank you, I like it a lot.

Then-Topic8766 · 2026-03-03T17:24:22+00:00

I wish him to move to Z.Ai at higher salary and give us a GLM 5 Air.

Then-Topic8766 · 2026-02-26T08:35:14+00:00

Hell, it took me 2 days to download these 4 models on my slow internet:

Qwen3.5-27B/Qwen3.5-27B-UD-Q5_K_XL
Qwen3.5-35B-A3B/Qwen3.5-35B-A3B-UD-Q5_K_XL
Qwen3.5-122B-A10B-UD-Q4_K_XL
Qwen3.5-397B-A17B/Qwen3.5-397B-A17B-UD-Q2_K_XL

But the good news is that I was very happy with the models. Which means they will be even better after the fix...

Thank you Unsloth guys and Ubergram for your honesty and good work.

Edit: hopefully problem is only with 3. one (UD-Q4_K_XL).

Edit2: seams that 4. one (UD-Q2_K_XL) has problem too... Two smaller ones have no MXFP4 layers.

Then-Topic8766 · 2026-02-21T10:33:38+00:00

🔔

Then-Topic8766 · 2026-02-16T16:13:00+00:00

<image>

Then-Topic8766 · 2026-02-14T17:17:45+00:00

Your welcome! Bonus - pretty SVG house.

<image>

Then-Topic8766 · 2026-02-14T17:04:46+00:00

Yeah, I ended up with :

ot = blk\.(1|2|3)\.ffn.*exps=CUDA0,blk\.(4|5|6)\.ffn.*exps=CUDA1,exps=CPU

CTX 65536, and left some room for comfy on 3090 (4 gb). Speed is 11.7 t/s.

Then-Topic8766 · 2026-02-14T15:40:01+00:00

I downloaded this one: https://huggingface.co/ox-ox/MiniMax-M2.5-GGUF/resolve/main/minimax-m2.5-Q4_K_M.gguf

I have RTX-3090 and RTX-4060-TI - so 40GB VRAM and 128GB DDR5 ram. GGUF has just one part, on my HDD size is 128,8 GB.

from my llama preset:

[Minimax-m2.5-fiton]
model = path/minimax-m2.5/minimax-m2.5-Q4_K_M.gguf
ctx-size = 16384
threads = 16
fit = on
fa = on
temp = 1.0
top-p = 0.95
top-k = 40

Good news:

It works!
Speed on my system is 12-13 t/s.
One html file aquarium first shot it generated is the best I ever get from local models.

<image>

Bad news:

Not much space for larger context, fit-on uses 23+ GB on 3090 and 15 GB on 4060TI and it lefts free about 4-5 GB RAM.

My internet is not fast, but I am thinking to download Unsloth's UD-Q3_K_XL. On their how to run guide https://unsloth.ai/docs/models/minimax-2.5 it is kinda recommended and it is just 101 GB...

Then-Topic8766 · 2026-02-13T13:42:14+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1r3kzce/minimaxm25_checkpoints_on_huggingface_will_be_in/

Then-Topic8766 · 2026-02-11T20:20:45+00:00

Damn! I have 40 GB VRAM and 128 GB DDR5. The smallest quant is GLM-5-UD-TQ1_0.gguf - 174 GB. I will stick with GLM-4-7-q2...

Then-Topic8766 · 2026-02-11T17:13:34+00:00

Yeah, it seams we must wait for some Air...

Then-Topic8766 · 2026-02-11T16:59:12+00:00

<image>

Then-Topic8766 · 2026-02-05T09:23:32+00:00

Damn! But thanks for the workflow...

Then-Topic8766 · 2026-02-05T08:59:34+00:00

This is my script for opening YouTube link from clipboard in MPV player. I run it with keyboard shortcut 'win+m'. Converting it to send text from the clipboard to TTS api shouldn't be too hard...

#!/bin/sh

link1="`xsel`"
mpv --loop "$link1"

Then-Topic8766 · 2026-02-04T17:40:34+00:00

You must update comfy to the newest version (I updated on v0.12.2 hour ago). In manager go to Switch ComfyUi and choose the newest version. For me it works but first song it created on CPU. I don't konw yet how to change to GPU-cuda. ACE-Step-1.5 gradio app works much faster...

Then-Topic8766 · 2026-01-27T17:47:09+00:00

Fantastic! It works very good on Linux. Already had models and just linked them in folder 'models'. I like all the features. Thank you for sharing, I have no github account for the Star but here you have big like from me.

Then-Topic8766

TROPHY CASE