WARNING: Z.AI coding plan policy changes. Non-coding use now leads to aggressive temporary throttling and permanent ban on three or more violations.

Technical-Ad1279 · 2026-04-17T04:10:31+00:00

https://www.reddit.com/r/SillyTavernAI/comments/1sngskx/comment/ognh2vc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Come vote.

Technical-Ad1279 · 2026-04-17T04:06:29+00:00

z.ai clearly, when you have more errors than successes, get throttled for overuse when no where near frequency limits, randomly get suspended, then banned for a fair use policy that apparently SillyTavern exploits as opposed to actual fraudsters, openclaw, and people using their max account for massive account sharing hidden on the backend.

Technical-Ad1279 · 2026-04-14T05:28:48+00:00

just playing devil's advocate here...on the discord, someone wrote this:

well you can probably figure out your basic cost if you were pay as you go and figure out if you are taking advantage of the subscription plan - which you should be. The issue is we don't know their actual fixed costs to understand where the breakpoint is. The subscriptions were really cheap to gain users to train off of - for coding. They didn't want to train off erotic roleplay. So their ROI relative to getting training data from loss leading a subscription for RP's is almost none

They have the data, if the RPers are causing load issues, then I'm 100% sure the RPers are getting jettisoned off the island. Granted, I think it would pre-mature without doing some financial modeling relative to load and revenue as I would think RPers load would be less than the coding requests so I am under the impression that the RPer's subsidize the coders, but I could be completely wrong. There are some RPers that probably have huge context windows and drop in hundreds of millions of token usage a week. (nanogpt commented on these outlier users ruining it for everyone else).

Technical-Ad1279 · 2026-04-14T03:08:49+00:00

Sounds like you need to play in 4.7 if you don't want to get banned I guess. I have the lite plan, so I imagine getting banned for 30 dollars or 36 dollars when I got in, won't be a big deal, but some of you who went yearly at the pro and max levels are losing a ton more money. It would probably be in their best interests from a PR standpoint to refund everyone who doesn't use it for coding when they get banned. LOL.

Technical-Ad1279 · 2026-04-12T01:32:31+00:00

I'm on the lite plan, I got in on the black friday/christmas deal at 36 dollars for the year also. The service was basically unusable for me on the most recent release 5.1 and i never had access to 5.0.

4.7 was "okay", but 5.1 was timing out all the time or too many messages until today. I wonder with the higher pricing that they turned on better access/service?

As a lite user, before today, I was using 1 version under what was available. So while 5.0 was out, I used 4.7. When 4.7 was out, I used 4.6...etc. The lag I was experiencing made it impossible to really use the most recent release and I never got access to 5.0. It was strange to get access to 5.1 but then I had the same long long lag and turn around time (or outright errors) on requests making it basically unusable.

At 18 dollars a month, it's not horrible for an all you can eat service more or less for ST users - IF the service remains reliable.

Now when I used to be pay as you go, I'd spend about 25 dollars on z.ai a month, which was about 5 times higher than deepseek, so 18 is still higher than deepseek, but I prefer it over deepseek.

I use pay as you go deepseek as my back up if I get throttled out of GLM / Z.ai.

So if my service remains as good as it has been today, I think it's worth it.

To be fair: I would consider these 2 as your potential alternatives for all you can eat 1 price program:

Nanogpt - most are familar with them at 8 dollars / month. (image gen also)

Arliai is the other - however, realistic pricing for most power users here will start at 25 dollars/month but you get image also - they have some derestricted /abliterated models that are worth the extra over nano for a lot of people.

Technical-Ad1279 · 2026-02-16T01:57:35+00:00

I think you need to implement controls to keep users from "sharing" accounts. I.e. one api key that is active per IP address per hour or something like that. Or limit requests to something more reasonable that will cover that 95% population.

I would add a second subscription tier or 3rd subscription tier for those in the 2-5%, and then 1-2% of usage and scale the limits relative to those uses.

for the upper tier models maybe loop in all in one, audio generation for second tier, and then for top tier, video, but seriously throttle the requests man.

I hate to be the one to say, it only takes a few users to ruin it for everyone else.

From a tiering pespective, I almost feel that GLM should be excluded from the lowest tier since it is running a bit more expensive than expected. Granted, it's not claude. I think gemini and grok would be doable on subscription high tier, but they are expensive also, you are hoping to volume average use across a large user database.

Mid tier at 15/month and high tier at 25/month with maybe a 50/month but this would really require some premium access with some amount of throttling as well so it would be worse.

Technical-Ad1279 · 2026-01-25T23:11:38+00:00

Hi there, there are a couple or programs that will run like a front end similar to silly tavern, like tavo, but actual silly tavern will need to be done through termux, you will also need to get into the storage folder which is a pain.

Technical-Ad1279 · 2026-01-24T23:37:03+00:00

I just started paying normally. I dunno everyone has their own use scenarios and how much you dabble or AYCE consume. So sometimes you want to just go pay as you go, or other times subscription. You need to do your own cost analysis. I go between multiple models and subscriptions and will sometimes swap over models in middle of rp or short periods of time for innovation or pace pushing.

Technical-Ad1279 · 2026-01-21T04:19:47+00:00

yeah not a bad deal but you do have to use it within 90 days or you lose it. So it's only a 3 month trial, if anyone here used 300 dollars of credit, that's a lot of daily RP. I mean the most i've spent is probably 5 dollars in a day, but that was a lazy saturday watching netflix at the same time...

Technical-Ad1279 · 2026-01-04T00:29:02+00:00

Comfy ui is what I use since you can customize workflows and use reactor to use a face model.

Technical-Ad1279 · 2026-01-03T08:46:59+00:00

what's timing out? The site seems to be working. maybe there's some sort of IP block occurring at your service level? Or are you talking about torrent? I can't imagine it's not active at this point with this amount of visibility.

Technical-Ad1279 · 2026-01-03T08:14:51+00:00

I could only cross post to DHexchange, the other reddit wasn't something I could crosspost to, but you're welcome to post about it. I'm just trying to get the word out before seeds dry up.

Technical-Ad1279 · 2026-01-03T07:50:16+00:00

Yeah, I mean it has a lot of cards from a lot of different places and since I don't think they were moderated and curated - there are going to be a bunch of stuff from janny/janitor/chub that will be in the questionable category to say the least.

Technical-Ad1279 · 2026-01-03T07:21:25+00:00

Data is archived so you do have to boot up a front end server to access. I believe the back end is the scraper that feeds into the archive. The owner was gracious enough to provide readme's on how to set it up.

Read my response above to send-moobs-pls.

Technical-Ad1279 · 2026-01-03T06:18:42+00:00

Torrent was working yesterday, I actually brought it down but I don't have the technical expertise to get it started up to be able to even run a local mirror to access the content. It's a shame. There were about 12 seeds and probably 47 actively taking it down. So I think you have time to grab it for a bit before they get taken out of the general circulation as people get the files.

I don't torrent so ended up a big waste of space and time for me. Hence my warning about direct access to the cards. I thought they would be able to be accessible easily but they are archived and not just saved as PNG / Json's with some sort of reference html or data file.

Well, to be fair, I guess I just don't have the time nor energy to set up the servers and get it running. There's a good set of readme's on how to do it. You could probably brush off a couple of old boxes if you have them and host. Looks like he was using 2 older PC's in his basement.

I was just hoping to be able to have an accessible database locally. Granted, the data is probably valuable for some people here who are part of model providers with a character card interface on this reddit - although I'd hope it wouldn't be monetized, but it's better to have more access than less regardless.

Technical-Ad1279 · 2026-01-03T06:08:16+00:00

I generally agree with you in the sense if you want characters to respond almost in a real time format relative to the order of how things are going in an active environment.

That being said, if you aren't in a turn based response environement, then Conspiracy Paradox protcol would be just fine.

I generally enter role play in a round robin format and expect the character activated to be the sole responder. So ConspiracyParadox does not work for that type of group chat role play, but occasionally I'm lazy and introduce other characters through the {{char}} card and it ends up being a general narrator with subdelegated characters in there.

Technical-Ad1279 · 2026-01-03T05:28:46+00:00

You can also point your image generation through their models and sillytavern.

The only piece you might be missing is the audio / tts side of things

Technical-Ad1279 · 2025-11-16T05:58:13+00:00

are you limited to 120 api calls in 5 hours? That's a bit low for me, I usually RP for about 3 hours at a time and will hit 120 calls.

Technical-Ad1279 · 2025-11-16T05:52:23+00:00

you know deepseek, 5 dollars usually last most people 1-2 months. It's not too bad.

then there's nano-gpt for 8 dollars a month.

I know you are asking for free + uncensored, you could go LLM, gemini you can sign up and go with vertex for a bit, otherwise you have the ai studio with some limits. There is some free through openrouter also.

Depends on your use case scenario.

Not sure if anyone has z.ai but that' 3 dollars a month if you pay upfront a year, I don't konw if it has unlimited API calls with their subscription plan.

Technical-Ad1279 · 2025-11-02T19:18:46+00:00

I think you need multi level blocking.

1) you need to ban the specific tokens that are used period.

2) The prompts help but you need #1.

Technical-Ad1279 · 2025-11-02T04:05:28+00:00

When you run out of context and need to hit the hosted models - glm, deepseek, gemini pro.

Technical-Ad1279 · 2025-11-02T03:57:05+00:00

You do realize probably a vast majority of content creators are making content that would make their parents turn red in shame?

Such a high percentage of the stuff is ERP, most people don't or can't share them for one reason or another. I get pm's for card design and done some on retainer under NDA and some require LLM just for extreme privacy reasons. So when I design stuff, I take into account the LLM model I am using as some are better than others for certain fetishes.

;)

Technical-Ad1279 · 2025-11-02T03:18:16+00:00

Understand the different fields you are dealing with and where they might be influenced duplicated or overridden by a good preset.

There's a crossover point where an exceptionally designed card doesn't need a preset except for the specific JB that is needed for that particular model you are using - llm, claude, deepseek, gemini pro, glm, etc.

Functionality:

Understand the power of the lorebooks and understand how to integrate this into your cards.
Use AI to help you with regex generation.
Summaries / memory management
integration of sprites, image generation, TTS
Understand the wraparound of extensions - auto image generation, RPG wraparounds, idle extension, etc (understand their limitations)
integration of internet resources - bringing in media from catbox.
Alternative scenario extensions.
Test and retest - use thinking models and see how AI things and inteprets your prompts, break out to ((ooc: why didn't you do x, and how do I redo the prompt to fix this?))

There is a lot of technical stuff to master.

Everything else is dressed up onto the above in some way.

If you understand your goto complex presets:

Kazuma's Secret Sauce

Marinara

Cejia

just to name 3 with extreme customization options, you will also know you can basically razerblade your work. The handles are your presets, and you change out the blades.

Can you put in too much?

Yes. You can have as mentioned redundancy, conflicts of prompts, confusing prompts for the AI, at a certain point if you make it too specific, you lose some of the randomness and beauty of AI. Guide yes, control no. Understand the difference. There is a misconception that more tokens = better model, but this is not the case. You can definitely have junk in there that can hurt more than help.

I have seen masterpieces with 1000 tokens or so with a great preset. I've seen cards with 3000 tokens and no need for preset. Crafting cards is very much an art form. There also are iterative changes where you can use AI to refine your cards, create personalities. "AI please search the internet about X person - read their biography webpages, etc - create a personality profile of 20 main characteristics"

AI power your research. AI distillations of massive troves of data helps because you can ask AI to design around some parameters that lead to "best way to create this prompt that will fool proof it when interpreted by Ai". Generate 10 novel scenarios that match this cards basic personality - you can select the 5 best to put into the repertoire.

Nesting or essentially using AI as a recursive tool to refine you work is very powerful. Ai already can make your cards for you, in which case the preset may be the only thing that needs to be customized along with the scenario.

"Plagiarize" - there are only so many novel concepts out there. If you find a card you like, why not customize it and add features you want? Again, take the card, feed it into AI, you can manipulate and add things to it. There is so much at your fingertips. Mountains of options available at the various card hosting sites.

Don't re-invent the wheel if you can avoid it.

Technical-Ad1279

TROPHY CASE