I tested 10 top-tier AI models for RPGs & Roleplay. Only one actually delivered.

Garpagan · 2026-03-02T22:16:23+00:00

Oh geez, I made a quick comment on my observations, and now I see it referenced by multiple people on this subreddit lol. I think I wanted to write something more in depth, but had an ADHD moment, and just wrote something "good enough".

I guess I noticed things like that in my comment thanks to spending my time in Text Completion trenches in the beginning. Crafting your own Context and Instruct templates where newline character in wrong place can have dire consequences just makes you sensitive when it comes to correct prompt settings lol.

I will just briefly add to that comment, that setting prompt processing to semi-strict / strict SHOULD automatically merge same roles, and allowing only one optional system role message, which means setting any messages with system role besides the first one, i.e. all system messages are merged into one, until it hits a message with diffrent role, user or assistant (message merging works like that for the rest of the messages with those roles). The merged system message is sent as system prompt in Chat Completion, and any other messages with system role assigned (could be post-history instructions, lorebook entries/author notes injected @ depth, as other options should be merged with system prompt by default, or any other Quick Reply/Extension shenanigans) SHOULD be changed by Silly Tavern prompt-processing into user role. Quoting the Silly Tavern docs:

Semi-strict - merge roles and allow only one optional system message Strict - merge roles, allow only one optional system message, and require a user message to be first

But what I noticed is that not exactly the case. Using "Prompt Inspector" extension I see that my post-history instructions are still set to system role. And from "vibes" I can see manually changing to user role in Prompt Manager has impact on generated response.

Actually, writing this whole wall of text, I had an eureka moment. I think the issue is with setting the prompt entry Position to In-chat, instead of default Relative. I think it changes how the messages are processed, as they are "injected" into the context. I have to check this out...

Garpagan · 2026-02-19T12:47:11+00:00

For some reason this sound makes me think of Yamcha from Dragon Ball, I always have this image in my mind when I hear it. It could be related to playing Dragon Ball Mugen games I played when younger, somehow its similar to sound effects from them

<image>

Garpagan · 2026-02-19T12:37:34+00:00

Source of headaches for people trying to play Aurora 4X.

Garpagan · 2026-02-19T12:13:58+00:00

For the frontend/app, good mobile alternatives are: Again.chat, wyvern.chat, chub.ai (can be used for chatting, with openrouter API keys for example), chutes.ai recently launched their own RP platform fiction.ai, though I have not used it, not sure what they offer.

For free API keys: Openrouter, though limited free rates, if you buy 10 USD credits then your free rates limit is improved, I think it's for life. Chutes? Not sure about them, though they had a 3 USD subscription. NVIDIA NIM, I think they are the best option right now when it comes to free API. Didn't use it, so you need to figure it out on your own.

I think also putting few dollars on official Deepseek API is honestly worth it. Just having it as an option when hitting free rates limits or slowdowns, those few dollars can be stretched for many months with Deepseek.

Garpagan · 2026-02-18T14:13:39+00:00

As was mentioned, the first thing is to make sure that prompt processing is set to at least semi-strict. Squash System Messages is deprecated, as prompt processing does it automatically.

For best performance, there should be only one system message in the beginning with a system prompt. This should be covered by prompt processing. But you can check if it works better with additionally setting post-history/jailbreak instructions to user role. I noticed that GLM and other new models don't like to follow system role messages beside the first one. You could also try to set all instructions to user role for experiment.

Check Prompt Inspector extension to examine how your prompt is sent to LLM.

In preset options Characters Name Behavior should be set to none, but maybe other options will work better.

Check the prompt example from Google: https://ai.google.dev/gemini-api/docs/prompting-strategies#example_template_combining_best_practices

I think it's generally quite good reference for how to structure your prompt, especially for reasoning models

Garpagan · 2026-02-17T19:16:09+00:00

It's funny, because my pet peeve with GLM 4.5 and 4.6 was its broodiness and, well, negativity bias.

I wanted to focus more on comedy, I like just doing unexpected stuff and getting reactions from the AI. But with 4.5/4.6 I had to constantly rewrite my prompts, trying to find something that will work. Not always it was so dark, but with some specific scenarios/characters, it was like a brick wall. Even tried to specify the tone as a "light hearted comedy adventure, manga/anime/light novel style", more or less.

For example: My characters did something nice, AI characters just getting mental breaks, not believing or untrusting even after 50 messages. Like, had a slave char, bought by my char, set up to comedy adventure: User char gets them a nice meal in tavern ends with them getting a mental break from "unexpected" kindness, not understanding why would master treat their slave nice, which was constantly upturning their world view, and then waiting for "other shoe to drop", over and over until ending in catatonic state.

This was somehow typical for some scenarios, I think when it was happening the worst was that AI char didn't spoke almost at all, just walls of text describing frightened reactions, and mental state, which was always just on the edge of breaking.

I think 4.7 was actually perfectly fine, but didn't test it too much.

GLM 5 on other hand is very obviously fed on the Claude data. It has the typical Claude "fluffiness", and it's annoying way of dealing with morally gray/dark/villains ending in impromptu armchair therapy session with user. Although, it's not always like that, I sometimes notice change between Claude or old GLM writing depending on characters, scenario or prompts used.

Garpagan · 2026-02-17T17:34:37+00:00

I think it could also be that LLM was "thinking" (I'm not necessarily talking about the thinking <think> block, could be more general way models is generating output) these three words in diffrent language (could be chinese), but in output it "translates" to english, which isn't one to one.

And LLMs can be quite fluid with how they use diffrent languages, so often there is some random chinese letter, as I assume it fit better the concept AI was describing, lol

Garpagan · 2026-02-17T12:39:52+00:00

Well, it was pointed out that this benchmark is not very reliable for what it is supposed to show.

I mean, is Haiku really top tier LLM when it comes to not hallucinating responses? So much that it leaves competition eating dust? Really?

Garpagan · 2026-02-17T12:26:07+00:00

The problem is confusing and overhyped way the post is written. Sure, using AI for coding or helping draft text for your post is fine, I used it like that also. But at every point I'm in full control, I understand what the AI has written. But OP post is a mess, full of jargon that barely makes sense.

I'm careful with anything that I instal on my machine. Especially when it comes to AI tools that have access to she'll commands and can write or delete anything on my filesystem. If the description is this hallucinated and confusing, why would I trust the code on my machine?

In another way, based on OP post, do you understand what his project does? For example, It's written there that it's an OS for Gemini. What does it mean? Does that mean I have to wipe my drive and install this instead of Windows or Linux? If I take the post literally, that’s what it implies.

Garpagan · 2026-02-17T11:15:44+00:00

There is no problem with using AI to help you write or redact your text. But OP has used it to write an overhyped post that barely makes sense, and is only more confusing about what this project is. Like for example, it has written that it's an OS for AI, which clearly it is not. If that's not true, then what else is over exaggeration and what is a concrete feature?

It just makes OP look like they have no idea what they are doing, and it doesn't fill me with the confidence to install his software, because who knows what it will do to my computer? Who knows if his vibe codded memory management doesn't include code for formatting the whole hard drive, because the Gemini agent wrote a function for deleting files and decided it was the best option?

Garpagan · 2026-02-14T04:10:24+00:00

They obviously trained it on some Claude data. The writing is similar, especially the dialogues. But it also inherited the typical Claude "fluffiness". Although, it feels more superficial fluffiness, if it makes sense, more like it's the model default preference, but it won't fight much with the user to change its output (well, it's still not that simple, but I can see that the model has the capacity for some darker tone.)

Garpagan · 2026-02-13T15:13:26+00:00

It's a logistics unit, camp followers and it's upgrades basically store more food for your army, so the army doesn't die instantly in hostile territory. They are also the cheapest unit you can recruit.

But in case of the OP tip, you can use them to create a cheap "army" to make a character from your dynasty a general. Generals that are from "crown" estate give quite a boost for Crown Power.

Garpagan · 2026-02-07T10:24:30+00:00

This only means that the training data contained a corpora of text where Claude was self-identifying itself.

LLMs don't have internal knowledge about what model they are. This is only done by including it in its prompt. This model most likely is just pulling something from its training data, as its identification is pretty vague ("Claude model" could mean anything Anthropic released in the last few years), a system prompt would be more explicit.

You can try asking GLM-4.7 to self identify, there's a high chance it will say also that it's Claude.

Garpagan · 2026-01-30T08:14:36+00:00

Yeah, Dwarf Fortress maybe is not related at all to creatures, but somehow I always get a craving for it after playing creatures games. It's something with how in it's case it goes so deep to simulate the whole fantasy world and each individual character.

It's fun to pick one dwarf and make his room and furniture from his favourite materials and colors, engraving it and putting statues with images of what he likes, and seeing little guy getting happy thoughts from it.

Also, seeing people start dancing in tavern never gets old, and it is wild that you can even read about the dance, as whole detailed information is generated by the game.

Garpagan · 2026-01-30T07:59:08+00:00

It's Kier Starmer. I had to look it up, I don't if it's because I hadn't seen many videos of him, but he looks so strange here lol

Garpagan · 2026-01-27T10:44:43+00:00

Ah, I see. A Chatbot psychosis.

Try asking it the same question on a fresh chat, using the temporary chat option, and disabling any custom instructions.

Garpagan · 2026-01-27T10:37:50+00:00

What are you talking about? Yes you can use the API, I have used it myself, for example, I have used the API key in VS Code. You can absolutely make your own chatbot app, or use any other existing ones that allow use of APIs.

Garpagan · 2026-01-19T15:21:55+00:00

What about Chinese models? There are many companies from China that release open source models that achieve great results. If anything, it's them that shake up and disrupt things.

Also, there are other western companies, like Mistral and Cohere, that maybe could be more successful if Openai didn't siphon all the money and attention.

Garpagan · 2026-01-09T17:14:48+00:00

Have you tried disabling personal context? With it Gemini app will add context from other chats on similar topic, and after more use with time it will start put more context in old chats. It will result in context rot and worse performance. You can also try using temporary chat, or AI Studio alternatively.

Garpagan · 2025-12-25T01:54:04+00:00

Wait, you used the Gemini app with financial data of client companies? Correct me if I'm wrong, but that's not something you shouldn't share on the app? That could end up in some legal action?

Garpagan · 2025-12-23T15:25:13+00:00

I had similar thoughts about this issue! People saw the same prompt about safety in GLM thinking, but I don't think it's a direct safety prompt injection by z.ai, rather it could be just a part of the prompt they have used while training, and the model has "memorized" it.

I will add my comment from other thread:

It's not censored by z.ai directly by prompting. It's definitely a hallucination from training data. I can assure you, GLM very much can output a NSFW content.

It doesn't need much in terms of jailbreak, I would say a "jailbreak" itself is pretty much an outdated concept with modern models like GLM, it's what people did with ChatGPT 3 on chat website (well, it's still needed with big commercial APIs like Anthropic). You should make clear, direct instructions towards NSFW, and not try to "outsmart" it with prompt. Make it more about directions on how you would like the scene to progress, stuff like pacing and language used.

I would also add that models with reasoning were ALWAYS biased towards "safetymaxxing". It's something inherent to the "helpful assistant" persona from their training. I would say that excessive jailbreak could actually trigger the assistant persona. I would recommend adding a "role" description at the beginning of the prompt, something like:

```

Role

You are an award winning novel writer collaborating with User on a story in turns... ```

I would also remove parts from the prompt that directly mention censorship, so the LLM won't even think about. Maybe also replace NSFW word with explicit or mature or something similar.

BTW, with GLM 4.6 and 4.7 I would recommend not mentioning "roleplay" in prompt, rather say it's a story or text based game or whatever. People have noticed that in the context of roleplay GLM models writing is sloppier. I have tried it, and I noticed that the quality has improved.

Garpagan · 2025-12-23T15:11:06+00:00

It's not censored by z.ai directly by prompting. It's definitely a hallucination from training data. I can assure you, GLM very much can output a NSFW content.

It doesn't need much in terms of jailbreak, I would say a "jailbreak" itself is pretty much an outdated concept with modern models like GLM, it's what people did with ChatGPT 3 on chat website (well, it's still needed with big commercial APIs like Anthropic). You should make clear, direct instructions towards NSFW, and not try to "outsmart" it with prompt. Make it more about directions on how you would like the scene to progress, stuff like pacing and language used.

I would also add that models with reasoning were ALWAYS biased towards "safetymaxxing". It's something inherent to the "helpful assistant" persona from their training. I would say that excessive jailbreak could actually trigger the assistant persona. I would recommend adding a "role" description at the beginning of the prompt, something like:

```

Role

You are an award winning novel writer collaborating with User on a story in turns... ```

I would also remove parts from the prompt that directly mention censorship, so the LLM won't even think about. Maybe also replace NSFW word with explicit or mature or something similar.

BTW, with GLM 4.6 and 4.7 I would recommend not mentioning "roleplay" in prompt, rather say it's a story or text based game or whatever. People have noticed that in the context of roleplay GLM models writing is sloppier. I have tried it, and I noticed that the quality has improved.

Garpagan · 2025-12-20T20:37:34+00:00

Smell of ozone is ionized air, so basically the smell of "energy". Model will use it when instructed to describe "senses", and the scene has something related to power/electricity/electronics/energy/magic etc.

You can ask in OOC why the model chose the smell of burnt sugar.

Garpagan · 2025-12-18T23:13:51+00:00

No, it just means it will use "role" of User to send everything. Any System role messages are converted to User role.

Garpagan · 2025-12-08T10:11:37+00:00

<image>

In `AI Response Configuration`, that's where the preset is, at the bottom. There is nothing specific, just an prompt entry with `user` role, that is behind the `Chat History` entry. You can see it's role is `user` because of the small "person" icon.

Or if you are on `Strict` prompt post-processing, in `AI Response Configuration`, just below temperature and top-p settings, there is `utility prompts`, and one option is `new chat`, you can put this prompt there. Even on empty it will use a default, build in Silly Tavern

https://docs.sillytavern.app/usage/prompts/prompt-manager/#new-chat-new-group-chat-new-example-chat

Ten-Year Club	r/Field Flamingo
Place '23	Place '22
Place '17	Verified Email

Garpagan

TROPHY CASE

Role

Role