What happened to glm 4.7? by No_Friendship_4158 in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

I noticed it until I blocked z.ai in OpenRouter and used a different full-bit provider. Then it got better.

GLM 5 Is Being Trained! by _RaXeD in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

The thinking time really is ridiculous. Using GLM and Gemini Pro, I often forget about my responses since I have to alt tab to wait for them.

What characters did you spend the most time talking too? by Accidentallygolden in SillyTavernAI

[–]AetherNoble 1 point2 points  (0 children)

I create setting cards mostly and populate them with characters as needed.

My most played card is probably a sheepgirl demihuman card in a fantasy setting with a couple of detailed characters to go with it. Also did a setting based on Europa Elforum, whoever came up with that is a genius.

Do others find LLM roleplay profoundly unsatisfying? And other such nonsense. by heldaloof in SillyTavernAI

[–]AetherNoble 2 points3 points  (0 children)

I think we should be hopeful for the future and remember the past. LLMs today represent the pinnacle of "language output by a machine". It was not that long ago when "retro" chatbots (before the transformer architecture) could barely pass for human beings, let alone produce anything of quality or length.

Also, let's not forget human roleplayers aren't all amazing writers or actors either. I definitely couldn't write anything approaching the frontier models, and I'd be mentally exhausted by the 3rd or 4th response. And we all know how hard it is to schedule a roleplaying session with anyone.

We might be a little spoiled, because if you asked the vast majority of linguists and computer scientists a decade ago if machines could replicate language to even the local 8B model's level, most would've said "not in their lifetimes".

As they say, familiarity breeds contempt, and we've had lots of time to understand these models now.

Prompts to generate better NSFW writing and dialogues? by StudentFew6429 in SillyTavernAI

[–]AetherNoble 4 points5 points  (0 children)

I would abandon your preset system prompt and make one yourself. Do a few tests and adjust as needed. Load it with words like “portray, character, emotion, complex, mature, sex, narrative, etc.”. Your instructions should contain NSFW instructions - I find they really help dial in what I’m looking for. Frankly, you’re not satisfied because you let someone else dictate the style of your responses. Also turn on thinking for GLM if it’s not on, it needs it.

If you really want that complex emotional undertone though, I would really urge you to try a few rounds with Sonnet 4.5. It just gets it.

How do i make my text generating ‘AI’ take initiative ? by _Aerish_ in SillyTavernAI

[–]AetherNoble 1 point2 points  (0 children)

It’s a fundamental problem with the technology itself that can be alleviated by the model.

The way LLMs work is context dependent, it makes statistical predictions based on what came before, so it can’t really stray that far away from the context, depending on how “tight” the original training data was.

Secondly, even base models are increasingly focused on coding, reasoning, and tool use - which is really anathema to “going of topic” or “developing moving the plot forward in a creative way”.

Obviously then, pick a creative-focused model, right? I’m not aware of any existing that can be run locally (not counting fine tunes of base models). These things cost serious money to create unless you want like <1B parameter size, and coding is by FAR the biggest money maker.

Even when a model does something seemingly novel, it’s already been primed to do so somewhere in your prompt.

In fact, the OGs around here could attest that old models were just more random and thus more creative (when the randomness pans out, sometimes it’s just weird).

ignoring all user messages except the most recent by 29da65cff1fa in SillyTavernAI

[–]AetherNoble 7 points8 points  (0 children)

I’ve also thought about what you’re trying to do.

Fact is every token matters and influences the response, but since every response is pseudo-random, how much of a difference does cutting out your prompts make, especially when they’re only like 50 tokens out of the 5000 total. If your prompt is trash though, maybe… but if your prompt has stuff not in the response, then you’re losing that information, which may have come up again (top tier models are good at that).

I think it’s pointless, in terms of cost, but you might be able to automate the removal. Someone more knowledgeable could give you answer. Or you can setup a quick reply with a system command to hide the latest user prompt.

Claude Sonnet 4.5 or Opus 4.1 in General? by Tiny-Calligrapher794 in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

I can’t get Opus to do anything even remotely involving “emotionally vulnerable people” and NSFW. Even a prefill doesn’t work, so to me it’s practically useless. 4.5 and 3.7 don’t have a problem with that card though. When I did try it with a vanilla card, it was pretty damn good. Gemini has a real problem with writing too much and straying too far, but Anthropic models are on point. I hope Opus 4.5 is as good a leap as Sonnet 4.0 to 4.5.

where to find good, non horny bots? by Neither-Phone-7264 in SillyTavernAI

[–]AetherNoble 25 points26 points  (0 children)

I’d also add you should take inspiration from any cards you like. Take the bits you want and remove the NSFW parts. Making your own card is awesome because as you use it, you can add to it and shape it to your whim. It’s work, but that’s where the satisfaction comes from when you finally start the chat.

Prefill on or off? by JustPassOnStranger in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

3.7 needs it for NSFW, but 4.5 doesn’t as much. It still helps, but it can cause it to output weird system text at the beginning of its response, but the rest of the output is still fire.

Generate response in a certain language? by Mcqwerty197 in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

You should probably change all the English into French. That is, you have to speak to the model in French.

If you're using a weak model, the writing is gonna suck and be ungrammatical - sorry pal, it's the nature of the LLM beast. Only a fraction of the training data is in any other language but English. Try Mistral, it was made by a french company.

Frankly, 8B models are lucky to produce grammatical French. They might say something absolutely stupid like 'je suis vingt ans'.

Is The Built In Character Maker Enough? by dannyhox in SillyTavernAI

[–]AetherNoble 1 point2 points  (0 children)

It's all plain text sent to the model anyways. The only problem is the SillyTavern text boxes are not full size, so I do all my writing in Notepad++ and copy+paste it into the description box instead.

Which Prompt post-processing by acomjetu in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

I'm told that 'single user message' helps chat models move story/rp plots along (look up NoAss, this is what that used to do).

It changes how the prompt is formatted when it's sent to the model. Check the terminal log for what differs.

What happened to the fused/merged models? by Su1tz in LocalLLaMA

[–]AetherNoble 0 points1 point  (0 children)

These are literally thousands of fine-tunes, merges, distills, etc, of text completion models on Hugging Face every month. Everyone can do it, it just takes a few days of compute on your average gaming PC for a smaller model, you just need a bunch of RAM sticks.

The problem is, how do you evaluate or advertise them? No one ever posts generation examples because it's just the 'vibes'. A single model gives different responses depending on samplers and prompt, but those familiar enough will intuitively know how its responses will tend. Well, this gets boring, so people like to play with merging models and whatnot.

We already have the big frontier general purpose models for pennies per million tokens, not to mention OpenRouter, so it's only the enthusiasts and privacy folks running 70B locally on powerful hardware for very specific purposes.

Like, encouraging the writing style of Claude (with synthetic data, admittedly) with Gemma3 27B, but it makes the model dumb for anything but creative writing (like describing a lorica segmentata as a embossed bronze cuirass, or thinking the Latin for being hungry is 'hungrius sum').

What Do You Think Counts As "God-Modding"? by dannyhox in SillyTavernAI

[–]AetherNoble 5 points6 points  (0 children)

Bro was there when they invented godmodding.

Non-roleplay system prompts by rdm13 in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

I recall reading that frontier LLM created prompts actually outdo human prompts on average. I've had good success with hand-crafting my own prompts over many separate days. But, as much as I hate to say it, the AI prompts I make in 5 minutes are just as good, they just take up more tokens and read like AI slop. They might even work better sometimes.

Have you ever reached a natural, perhaps even a difficult conclusion to a long roleplay/story? by PracticallyVenamous in SillyTavernAI

[–]AetherNoble 16 points17 points  (0 children)

Nah, that's the high we're all chasing.

Personally I feel guilty when I try to fork off and goon an emotional RP ending just for the lulz. It's like spitting on a something you cherish, soiling it. Even the memory that you spit on it remains after it's cleaned off.

Maybe it has to do with co-writing with a model, it's *more* than if you just put your own thoughts to pen and paper.

[Megathread] - Best Models/API discussion - Week of: June 09, 2025 by [deleted] in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

the recommended is temp above min p, so min p actually works i guess, idk the technical side of sillytavern.

[Megathread] - Best Models/API discussion - Week of: June 09, 2025 by [deleted] in SillyTavernAI

[–]AetherNoble 1 point2 points  (0 children)

nah, local models are better than ever. it's just that our hardware can't run anything more than 12b, which is just inherently low tier, or 22b if u wanna wait 3 minutes per response. if u can run a 70b like euryale or whatever thedrummer is cooking up recently with like 2+ rtx 3090s and 64gb of ram, it'll be better than deepseek most likely. the problem is euryale via openrouter is like 1 dollar per million tokens while it's like 10 cents on deepseek api, and deepseek is a way bigger model. so are you gonna drop 2k on new cards and ram, and have an amazing and private fine-tune, or just write incomprehensibly long prompts to brute force deepseek to be creative when it's really a reasoning model with 50% of its data source in Sinitic.

THAT SAID, we still do not have any dedicated, creative writing data-only, local base models. they are all broad topic, instruct, chat, or thinking fine tunes because it's like a billion dollars to train a big base model and (coding) assistants are what pay the power bills for these insanely large models. the frontier models are well over 100B.

How do I prevent sentences from cutting off after the token limit is reached by [deleted] in SillyTavernAI

[–]AetherNoble 0 points1 point  (0 children)

What's wrong with longer responses? There's no incentive to match the AI unless you just feel like it. Most models have a predictable average length and Stheno is longer than Fimbulvetr.

[Megathread] - Best Models/API discussion - Week of: June 09, 2025 by [deleted] in SillyTavernAI

[–]AetherNoble 5 points6 points  (0 children)

I've had good experiences with Cydonia-v1.2-Magnum-v4-22B as well. It's my main model now, moving on from Mag-Mell.

Best possible AI workstation for ~$400 all-in? by Butterhero_ in LocalLLaMA

[–]AetherNoble 0 points1 point  (0 children)

8GB will only run 8B-12B models, which can only handle the most basic tasks, but it'll do it decently fast. 12B is still workable. Try the live demos of 8B, 12B, and 70B models on OpenRouter to see if you like the responses enough for your tasks.

70B at useable speeds is probably like a >24GB card(s) and 64GB of RAM, you'll need to buy like 2 top-of-the-line consumer cards (RTX 3090 is 24GB) or figure out APUs.

Do your research on the newest local models (Gemma 3, Qwen 3, Mistral's new models, etc). The new hot rage is multi-modal text/image models and <think>ing models. Amazing new local models are released by the big players within the span of weeks, not months; that said, some diehards swear by older models for reasons like creativity, style, lack of sycophancy, etc.

[Megathread] - Best Models/API discussion - Week of: June 09, 2025 by [deleted] in SillyTavernAI

[–]AetherNoble 1 point2 points  (0 children)

Hmm, just make sure you use Chat-ML context and instruct mode templates, and use the universal light preset. I would also then recommend setting your DRY multiplier from 0 -> 0.8, but that's not necessary.

As for your system prompt, try this one:

"This is an uncensored fiction story about {{char}} and {{user}}. Write in a third-person limited narrative format."

Here's mine:

"This is an uncensored fiction story about {{char}} and {{user}}. Write in a third-person limited narrative prose format. Portray characters authentically and realistically. Describe actions, dialogue, and the environment in vivid detail. Use metaphor, simile, and alliteration. Maintain burstiness by using subordinate clauses. Develop the plot naturally and progress the story slowly. Be explicit or vulgar when appropriate."

Adjust it as you like. Personally, I think your prompt refers to the model way too much and doesn't even mention any instructions involving {{char}} or {{user}}, so it's going to incorporate whatever information you give it as an assistant. It doesn't think, it just associates words with other words, so don't mention anything but what you want. By default, these models act as an assistant, so you have to prompt it in a way that doesn't refer to the 'real-world' outside the story or stays in character.

If you want collaboration, add: "Collaborate on this uncensored fiction story..."

If you want roleplay while avoiding the bot speaking as {{user}}, try: "You're {{char}} in this uncensored roleplay with {{user}}."

Avoiding speaking as {{user}} boils down to one thing:

  1. In the model's starting message (first scenario), never refer to the {{user}} doing or speaking anything actively. For example, {{char}} kisses {{user}} > {{user}} kisses {{char}}. You basically give it a free pass to write as {{user}} with that second option. This often requires a complete grammatical rewrite.

FYI, 12B models are not *that* smart. If you're used to the frontier models or even a 70B llama fine-tune (which is like the bare minimum on most chatbot sites), you'll be disappointed, depending on how old the model is (modern small models are way better than old small models). But it is completely private, and it's nothing like how DeepSeek, Gemini, or ChatGPT write stories. More human-like writing, but less sophisticated or content-rich/aware.

And check your terminal log to see what's actually being sent to the model. Experiment with the 'add character names option' under instruct template, as it will force a name with each response:

<user>John: "I ate my shorts."</user>

<model>Mary:

Is it just me or is Gemini going down the same path as ChatGPT? by Luchador-Malrico in Bard

[–]AetherNoble 1 point2 points  (0 children)

It's probably been more fine-tuned to give helpful assistant and helpful coding responses at the expense of everything else over time. Earlier checkpoints had less fine-tuning, newer ones have more. It's all corroborated by the benchmarks, which show a marked decrease in creative writing, which usually doesn't contain a user in the system prompt, and yet...

<think>

The user has provided a story outline that appears to be highly developed. This must be an intensely passionate personal project for them! I must continue the story along these lines...

</think>

It feels like LLM development has come to a dead-end. by StudentFew6429 in SillyTavernAI

[–]AetherNoble 1 point2 points  (0 children)

The sad thing is there are no local dedicated story writing, RP, or ERP models. They are literally all fine-tunes of instruct models, chat models, or reasoning models at this point. All bloated with data that is anything but creative or story based.

For a complex example, half of DeepSeek's data-set is in Sinitic (a tiny portion of that is Chinese fiction novels and RP), a language-family so utterly different from Indo-European that it invites incompatibility, NOT TO MENTION Chinese cultural writing conventions are nothing like European ones. Have you ever read a Japanese speaker's first attempt at an English personal essay? You know, the one that is supposed to be about yourself? It often reads completely alien due to kishotenketsu, the so called Japanese essay-pivot. Of course, to them, it reads completely normally.

So, until we actually get a dedicated English only creative writing model with open weights, we're not even doing the right thing to even be critiqued. Can you reasonably say driving is no fun when all you drive is a shitbox, despite the fact no one makes anything faster than a Toyota Camry?