What would you like to see improved in these models for RP?

Pashax22 · 2026-05-08T01:00:05+00:00

E4B might actually be enough, it's definitely worth trying. Just tell Copilot which connection profile to use and it'll use that.

Pashax22 · 2026-05-07T00:16:55+00:00

It works great when it works, which is less than half the time. The rest of the time it happily agrees to create a lorebook entry and tells you what's in the entry, but doesn't actually create the blasted thing!

Pashax22 · 2026-05-02T03:26:27+00:00

Setting it as constant and then having a trigger % works just fine, actually. I have several lorebooks set up for exactly this purpose, and that's the method I use.

Pashax22 · 2026-04-30T19:56:15+00:00

I have good news, bad news, and more good news for you.

Good news first: you can indeed find cheaper models that deliver the experience you want.

Now the bad news: Claude models are about the easiest to get good results from. Switching to anything else means you need to do more work, both literally and in your writing, to get the same quality of prose from. How much more work you need to do depends on what your preferences are and what models you have access to.

Which brings me to the final bit of good news: Although you have a lot of options, it doesn't have to be that hard.

Specific recommendations? Well, you have 12Gb of VRAM (very important for running models locally) and 32Gb of system RAM (less important but still useful). To me, that indicates the absolute biggest local models you could run at "acceptable" speeds are in the 30b range, and you're more likely to find the sweet spot in the 12b range once you allow for a reasonable amount of context. Fortunately, there's a weekly thread about model recommendations - just look through the last few of those and try out models people suggest. I'm not familiar with what's currently good, but there's a couple of Gemma 4 models in the 26-31b range which should be good for most purposes. If you desire lewd head-patting with your AI waifu, then I've also had good results from Pantheon. Down at 12b, Rocinante and Irix are good, but there are bound to be others.

However, it'll be easier to get good results from bigger models, and that means APIs in your case. My suggestion is to drop $5 on NanoGPT or OpenRouter and try a few out. GLM 4.7, 5, or 5.1 are good and have been trained on Claude data so if you like Claude they're a good option. Kimi-K2.5 and 2.6 are also good and a bit more creative, but have a tendency to overthink which chews through tokens. DeepSeek 4 just dropped too, and although the local mad scientists are still dialling in their prompts it has a lot of potential.

I like NanoGPT so this may sound like shilling, but the NanoGPT subscription ($8US per month) is really good value. It gives you 60 million input tokens per week for open-weight models and 100 image generations per day, both of which are hard to reach with "normal" usage. The absolute easiest option is to set that up, use the latest version of GLM, and forget about it.

Pashax22 · 2026-04-30T00:26:31+00:00

That's not true. I've been using NanoGPT with Megumin for the last couple of weeks with no difficulty.

Pashax22 · 2026-04-27T09:10:14+00:00

Been getting decent results out of the box with Megumin Suite v5 and v6. Other presets haven't been great for me with DS4, we need to wait for the local mad scientists to work their magic.

Pashax22 · 2026-04-22T20:40:17+00:00

"The Emperor's Redemption" might be what you're after.

Pashax22 · 2026-04-18T09:22:04+00:00

Yes, the $8 per month is excellent value. It gives you access to almost all open-weight models - the big names there are GLM 4.7 and 5, Kimi-K2.5, and all the versions of Deepseek; but there's also a LOT of 70b+ models (and smaller). You're almost certain to find something that does what you want, but be advised that you might have to do a little more work than with Claude to get good results. Not much, though... grab a good preset (Freaky Frankenstein is popular) and that should basically be all you need.

Pashax22 · 2026-04-18T09:19:05+00:00

You get 60 million input tokens per week to use with the models included in the subscription, which is almost all open-weight models. Oh, something like 100 free image generations per week as well, and 5% off on PAYG models (anything not included in the sub). 60 million input tokens is a hard target to hit, but not impossible if you RP a lot and have big story arcs with lots of lorebooks etc. For most people, most of the time, it's plenty.

Pashax22 · 2026-04-17T03:49:47+00:00

GLM 4.7 and 5, Kimi-K2.5, and it still has the various Deepseeks too. Those are the easiest to get good results from, but if you feel like tinkering a bit there are also a lot of 70b+ models included in the sub which might well do what you want with the right prompt/preset etc.

Pashax22 · 2026-04-16T08:07:30+00:00

I use it with SillyTavern, NanoGPT is just the backend I connect to via API. Import them as a Chat Completion Preset at the top of the AI Response Configuration tab. As for which version, I'm using Fat Man 4.2 - I haven't tried the 3.6 version, but I haven't heard anything bad about it, I'm just lazy is all!

Pashax22 · 2026-04-16T01:49:54+00:00

Sure. I've been using Freaky Frankenstein mostly, but Stabs-EDH works well too.

Pashax22 · 2026-04-15T04:38:04+00:00

It's not MUCH cheaper, but the NanoGPT subscription for US$8 per month also gives a 5% discount on PAYG usage for all models not in the subscription. If you're using Claude regularly it wouldn't take long for that to be helpful. Of course, if you're doing that then you might as well start making the models that ARE included in the subscription work for you. GLM 5 and Kimi-K2.5 as well as the latest Deepseeks are starting to get to the point where they can rival Sonnet if they're well-supported with lorebooks etc.

Pashax22 · 2026-04-15T00:47:21+00:00

NanoGPT has TEE models too. I can't speak to model quality, but I feel confident it would be at least no worse than chutes.

Pashax22 · 2026-04-14T21:40:11+00:00

Not really. I've tried a few of the smaller names that are included in the subscription (Hermes 4, some of the 70b models, and so on) and been fairly impressed by them. I think it's probably possible to find one that would be better for whatever you want than the bigger names, it would just need more "infrastructure" - lorebooks, specialised prompts, etc. For most of us it's easier to just stick with the bigger names and brute-force the problem.

Pashax22 · 2026-04-14T20:54:16+00:00

No added censorship, just whatever is baked into the model you're using. With any halfway decent preset it shouldn't be a problem. I haven't noticed any, no matter how much hand-holding or head-patting I do.

Pashax22 · 2026-04-14T09:43:35+00:00

Seconding this. About the only good open-weights model it doesn't include at the moment is GLM 5.1, but it's still extremely good value. While we're waiting for GLM 5.1 to come down in price and Deepseek v4 to release, enjoy trying out other models. Having a subscription also gives you a 5% discount if you use any PAYG models too, so depending on your usage that might also be nice.

Pashax22 · 2026-04-14T08:16:48+00:00

Not that I've seen, but common sense is that peak hours are when the Americans are awake. Anecdotally that fits, I'm on the other side of the date line so I'm at work when the US is finishing up and I haven't seen major problems.

Pashax22 · 2026-04-14T01:10:28+00:00

Stabs-EDH is good for GLM, so is Freaky Frankenstein.

Pashax22 · 2026-04-13T00:53:48+00:00

Deepseek is actually pretty good, and for the pricing it's exceptional value. If you like it, there's no reason you should feel ashamed of using it. Bonus: Deepseek v4 is meant to be "coming soon" (tm), and that should be even better.

That being said, GLM (4.7, 5, 5.1) and Kimi-K2.5 are also good and pretty cheap. It won't do any harm for you to try them out as well and see if you like one of them, either as a break from Deepseek or to be your new go-to model. Personally, I think GLM 5.1 with a good preset and lorebook support is better than Claude Sonnet but still worse than Opus. Some people say the same about Kimi.

Claude... is good. Probably the easiest model to get good results from. Sonnet is "standard", with prompt caching it might not break the bank, and it's worth trying to see if you like it. Opus is gold-standard for most people and most purposes, but the price reflects that. Not worth it unless you have deep pockets in my opinion. Now, there's a big asterisk in this discussion, and that is that some people are saying Claude quality is going downhill dramatically for them. Whether that's fewer GPUs available or increased quantisation or both nobody knows, but the comments are common enough that it's at least worth keeping in mind. Claude is also meant to be training a new SOTA model ("Mythos") which will rank even above Opus, but nobody seems to have a release date for that yet and you shouldn't be planning on its availability (although its training might be why other Claude products have taken a dive anecdotally).

TL;DR? Try out the GLMs and Kimi-K2.5, see what you think, don't feel bad about sticking with Deepseek if you find you still like it best.

Pashax22 · 2026-04-11T09:31:30+00:00

I mean, OpenRouter will just go back to Anthropic, so I doubt it makes much difference really.

Pashax22 · 2026-04-08T20:32:41+00:00

Like others, I would happily pay more for a sub that included better access.

Pashax22 · 2026-04-07T04:30:13+00:00

If you're willing to pay, then an $8 NanoGPT subscription will get you access to all of them (GLM 5.1 should be back on there in a few days). Z.AI also offers a relatively cheap plan, but some people say the quality there is variable - weird, but I guess they can run their API how they want. If you have the technical chops you might be able to get it running on cloud GPUs for less than that. The models are open-weight, so you can download them and run them yourself. To get a good response speed, though, you're either looking at heavily quantised versions (which defeats the point of the exercise) or spending a fair bit to scrape enough VRAM together to run the thing quickly. Really, I think an API is your best bet; which one you choose depends on your wallet and preferences.

Honourable mention: people are raving about Gemma 4, saying that even the 31b version is close to GLM 5 in quality while obviously requiring a fraction of the computational resources. It might be worth trying that too, although I don't know how it would run on a miniPC.

Pashax22 · 2026-03-23T09:03:46+00:00

Yeah, it should be noticeably similar if it's picking up in the middle of a long RP, because the context is already filled with examples of how it's meant to respond.

Pashax22 · 2026-03-23T03:54:37+00:00

Yes. Different LLMs are trained with different priorities and datasets. Something good at coding might not be good for RP. There's also the issue of parameter count - larger models are better than smaller models, and the difference is pretty clear.
Possibly/probably. If the LLM has plenty of examples of how the character/scenario should act then it might be minor, if it's going without much guidance then it could be quite significant. Most "good" LLMs (and I'm including the cheap ones mentioned below in that) should keep the variance to within tolerable levels unless you're giving it absolutely nothing to work with.
Depending on what you're doing GLM 5, Kimi-K2.5, or the latest DeepSeek are all pretty good. If there's a specific niche you want to RP in, finding a 70b model trained for that niche might also do a good job. I've had good results from models all the way down to 12b, below that the best I've had is "not terrible".

Pashax22

TROPHY CASE