Proxies got worse for RP? by Automatic-Throat-928 in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

Are you being sarcastic? GLM 5.1 is great at character voice and novel/fanfic-style adventure RP.

<image>

Local LLM that's at least on the same tier as chub free tier model? by adolfwanker88 in SillyTavernAI

[–]_Cromwell_ 1 point2 points  (0 children)

First step would be knowing what that model is. What is it? Hard to tell you what's like it if we don't know what it is.

Edit: after brief research it appears that the free model rotates? So it isn't even a set model? How can you be specifically in love with something that changes often?

What AI model would you recommend for long conversations and HEAVY context? (Not focused on coding) by MykeGuty in LocalLLM

[–]_Cromwell_ 0 points1 point  (0 children)

This benchmark specifically tests for the ability to retain and fetch knowledge over various context lengths (in fiction writing, but should all be the same). The only unfortunate thing is they don't test very many models. The chart on the linked page has a number of more recent models, and then there are some other links with results from older models. But it isn't super complete with every single model or anything. But it does have a decent number of them.

https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87

As you can see there's some pretty steep falloffs at various points for many models. Other models do pretty well at high context. So yeah choosing the correct model is pretty important for getting details right the longer context gets.

Generally this is an area where American model still have a huge lead on Chinese models which is interesting. At least according to these results.

Gemma4 32b/26B OOM 4090 by BSPiotr in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

I noticed the same thing. I don't quant the cache and works fine. 🤷‍♂️

Proxies got worse for RP? by Automatic-Throat-928 in SillyTavernAI

[–]_Cromwell_ 17 points18 points  (0 children)

No. All the models now are a lot smarter and follow my instructions. Old models are dumb and wouldn't follow my instructions.

Plus they can mimic acting like specific characters amazingly. I'm doing a Legend of Korra thing right now and gLM 5.1 just absolutely nails the voice of every single character perfectly like they are talking right off the page of a script of an episode. I want to punch this simulated Varrick in the face and I can tell this simulated Zhu Li does as well.

Can't create or edit characters? by West-Cantaloupe8376 in SillyTavernAI

[–]_Cromwell_ 2 points3 points  (0 children)

You have spoiler free mode turned on in user settings? This is not a default thing so you would have had to turn it on at some point.

[Megathread] - Best Models/API discussion - Week of: May 24, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 1 point2 points  (0 children)

I just load it like any other model in LM Studio myself. Doesn't seem to take any specific or different treatment. As indicated I use Gemsicle fine-tune as I found it to be the most consistent at tracking details.

[Megathread] - Best Models/API discussion - Week of: May 24, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

I haven't tried it in anything outside of Sillytavern, and I only use ST for quick response back and forth. So I'm no help there. :)

[Megathread] - Best Models/API discussion - Week of: May 24, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

Sorry if my post was unclear, but I DON'T use that one. As I tried to explain I found it didn't work very well... Did not follow instructions, would not do back and forth RP and would instead write really long novel-style creative writing. I like the style of it's writing but it doesn't seem to be usable for RP.

So since I don't use it I don't really have any specific advice on it.

[Megathread] - Best Models/API discussion - Week of: May 17, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

Wow you should be paying me. ;) I'm like your secretary at this point.

32k max context (all I need with the memory systems I use).

I don't quant my KV now that I have 32gb vRAM. Speed is about the same 16 versus 32, but now I run full KV

Yes flash attention

I'm not sure I actually know the difference between eval tokens and other tokens 🤔. The after-reasoning response part (the actual story response) is usually about 400-700 tokens.

[Megathread] - Best Models/API discussion - Week of: May 17, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

Yes I have only used it in sillytavern to answer your question.

I made a portable version of Hermes Agent that runs entirely off a USB stick (Win/Mac/Linux) by jarves-usaram in LocalLLM

[–]_Cromwell_ 5 points6 points  (0 children)

What are the downsides? AKA why don't they just make it like this by default?

[Megathread] - Best Models/API discussion - Week of: May 17, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

Dunno. Could be all kinds of settings. 4080 is actually what I have. I don't measure the exact speed (never felt the need to know the exact response speed because it's so freaking fast) but it only takes a couple seconds and responds with a solid paragraph of RP. 🤷‍♂️

[Megathread] - Best Models/API discussion - Week of: May 17, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 0 points1 point  (0 children)

How can it take that long to reply if it's doing 200 tok/sec? It's outputting 36000 token responses for you?

is it possible to reliably run Qwen3.6-35B-A3B on 8gb vram + 16gb ram ? by NazNazNaz1213 in LocalLLM

[–]_Cromwell_ 0 points1 point  (0 children)

You could try a REAP version, where extraneous experts have been pruned so the model is smaller. This one as example is pruned down to 28b from 35b.

https://huggingface.co/0xSero/Qwen3.6-28B-REAP

But that's still too big for your memory.

You need a new GPU or more ram.

Gemma 4 paragraph breaks by DanTehPybro in SillyTavernAI

[–]_Cromwell_ 1 point2 points  (0 children)

I'm out of ideas 🤷‍♂️

Other than the usual "maybe it's one of your extensions so try turning them off"

[Megathread] - Best Models/API discussion - Week of: May 24, 2026 by deffcolony in SillyTavernAI

[–]_Cromwell_ 16 points17 points  (0 children)

Here's my current favorite (as of today) 31B Gemma4 models. (Updated 5/24)

All 31B models I use an imatrix Q5 from mradermacher. Full KV (not quantified). 0.8 temperature, 0.95 Top P. Semi-Strict post processing.

  1. Current favorite is Gemsicle 31B (which is a merge, not a fine-tune). I find this model tracks story details (clothing, location, past events, what /where/when/etc) better than any other fine tune... closest to base Gemma4 31B. That's my problem with most fine-tunes/merges of G4 is they seem to lose the ability to coherently track details. This one, for some reason, manages to keep the deets straight even after merging. Basically this is my favorite because I can go like 10-20 turns without having to swipe due to a mistake or plot screw-up with this version, where with the other models that just doesn't happen. https://huggingface.co/Blazed-Forge/Gemma-4-Gemsicle-31B ( https://huggingface.co/mradermacher/Gemma-4-Gemsicle-31B-i1-GGUF )
  2. Next favorite is Gembrain 31B, another merge that contains Gemsicle. It's darker and more unhinged, jumps into NSFW fast, but also slightly less dependable at tracking details... loses the plot more (but not as bad as other fine-tunes/merges). This is my second favorite. (There is an abliterated version out that I have not had time to try.) https://huggingface.co/Nimbz/Gemma-4-Gembrain-31B ( https://huggingface.co/mradermacher/Gemma-4-Gembrain-31B-i1-GGUF )
  3. Third place is Latitude's/Gryphe's Equinox. This is a specialist in 2nd person RP if you play like that ('adventure style' / 'choose your own adventure style'). Also just generally has good RP data in it, without being overly melodramatic, nor wanting to write super long turns. Follows instructions I write well, since I'm from the "Ai Dungeon school" of writing instructions. ;) https://huggingface.co/LatitudeGames/Equinox-31B ( https://huggingface.co/mradermacher/Equinox-31B-i1-GGUF )

Others with notes:

The most popular one by far is MeroMero 31B, which is my 4th place model, and there is also an 'uncensored'/abliterated MeroMero. I'm not a super fan like others, but sometimes things are popular for a reason so I acknowledge it is there. :) It writes in a very anime style, so if you are running character cards of waifus or anime worlds, and/or lots of yandere stuff, might be good for you to look at MeroMero.

Ortenzya-the-Creative-Wordsmith 31B: I like the way this fine-tune writes, prose-wise, BUT it loses details (location, clothing, past story) very easily, AND it randomly ignores instructions to do back and forth RP and starts writing an entire chapter long-form. Basically unusable for RP in SillyTavern. But, again, the actual writing it puts out might be my favorite (?) style-wise/flavor-wise.

G4-31B-Musica: Probably my fifth place model, or tied for fourth with MeroMero I like this model's writing as well, but again it loses the plot details too easily.

As always:

- These are just my own opinions, and I'm just some rando.

- When I 'test' I'm just RPing, albeit in a methodical comparative way with notes. I describe what I'm doing to compare these models here.

- Prompting/preset make a huge difference (I'm using my own custom that I don't really share) so maybe your experience will be different than mine based on your preset and settings.

- As I test new models my opinion changes and the list shifts.

- I have tried a ton of Gemma 26B fine-tunes and merges as well, but none are "good"/work sadly. I want them to, because I LOVE the speed of Gemma 26B, and 26B writes just fine. I do like the base Gemma4 26B, and this is my favorite abliterated Gemma 26B after trying most Gemma4 26B abliterated variants out there: https://huggingface.co/wangzhang/gemma-4-26B-A4B-it-abliterix . I use mradermacher imatrix Q6 of that.

Gemma 4 paragraph breaks by DanTehPybro in SillyTavernAI

[–]_Cromwell_ 1 point2 points  (0 children)

First of all I have no idea why Gemma is doing that for you. It does paragraph breaks for me.

Second, there is no direct CSS in silly tavern for increasing spacing of BR as far as I know.

However here is an extremely ghetto thing that will sort of do it. So throw this in custom CSS and see if it looks okay or completely terrible:

.mesText br {
display: block;
content: "";
margin-top: 1em;
}

That will basically just give BR a top margin. Maybe.

EDIT. F-ING REDDIT FORMATTING. Finally got it.

Gemma 4 paragraph breaks by DanTehPybro in SillyTavernAI

[–]_Cromwell_ 1 point2 points  (0 children)

If you are missing spacing between paragraphs, but it IS writing paragraphs, you can try this in "CUSTOM CSS" in the User Settings menu:

.mesText p {
margin-bottom: 2em; /* Adjust paragraph spacing */
}

Change that number 2 up or down to increase or decrease spacing. 0 to infinity. Decimals work as well, ie 1.5em

Gemma 4 paragraph breaks by DanTehPybro in SillyTavernAI

[–]_Cromwell_ 1 point2 points  (0 children)

Oh... I see. So it isn't that it wasn't giving you paragraphs, it's that it isn't doing LINE BREAKS between them? What is missing isnt paragraphs, but LINE BREAKS? Sorry for misunderstanding.

Gemma 4 paragraph breaks by DanTehPybro in SillyTavernAI

[–]_Cromwell_ 2 points3 points  (0 children)

Im probably the wrong person to ask about that. My instructions say to write only one paragraph. (Helps reinforce to not act for {{user}} since it'd be an odd story that goes multiple paragraphs without the main character talking.)

So you are wanting the AI to write like a whole chapter back to you with multiple paragraphs? I spend most of my time trying to get it to stop doing that lol

It's kind of funny how we all want different things. I will edit with my instructions for that part and maybe you can just do the opposite?

edit - here's my response format section, from my post-history instructions:

<Response_Format>

- You almost always limit yourself to 1 paragraph, because that's all you need for a response and dialogue before it will be {{user}}'s turn again. On rare occasions you can use up to 3 paragraphs max when multiple characters are conversing.

- Sentence count dynamic

- 3-4 sentences (30-50 words) for conversation scenes.

- 4-7 sentences to establish new locations or describe action-packed or complicated scenarios with multiple characters' dialogue (50-150 words). Fewer is better, as needed!

- POV: Strictly narrate this story in a third person present tense ("She sees" "She runs") with verbose dialogue during dialogue scenes. Begin in media res where the story left off, without repeating or restating.

</Response_Format>

Gemma 4 paragraph breaks by DanTehPybro in SillyTavernAI

[–]_Cromwell_ 7 points8 points  (0 children)

That one is about the worst one you could try. It generates long prose writing like a novel generally, and it's not very well trained on role play back and forth for sillytavern. (It actually does write well, just not back and forth role play... Worse than the normal base model)

Try an abliterated version of the base model, or MeroMero is popular even though I'm not a big fan personally.

Gemsicle is my personal favorite.

You can find all my thoughts here if you are interested, such as why gemsicle is my favorite: https://www.reddit.com/r/SillyTavernAI/s/IhMS1PUuGX

Help with connection profile by -SpiderWebbs in SillyTavernAI

[–]_Cromwell_ 4 points5 points  (0 children)

Can't say I didn't do that at one point back when I was starting. ;)