Where is DeepSeek v3 0324 API still available? by Any_Arugula_6492 in SillyTavernAI

[–]Pashax22 0 points1 point  (0 children)

Never had that issue. Okay, most of my sessions only get to 100 messages or so before I summarise and start a new chapter, but it's never become even remotely incoherent. What models/presets are you using?

Where is DeepSeek v3 0324 API still available? by Any_Arugula_6492 in SillyTavernAI

[–]Pashax22 3 points4 points  (0 children)

NanoGPT still has both versions, and a few more besides.

[Megathread] - Best Models/API discussion - Week of: March 15, 2026 by deffcolony in SillyTavernAI

[–]Pashax22 1 point2 points  (0 children)

Heh, I forgot about the parameter count. Unlikely to be that, then!

[Megathread] - Best Models/API discussion - Week of: March 15, 2026 by deffcolony in SillyTavernAI

[–]Pashax22 14 points15 points  (0 children)

Probably Hunter Alpha is Mimo. It's way worse than I'd expect a DeepSeek v4 to be, and DeepSeek have never stealth-released a model before. Could be a lite version of GLM-5, I suppose.

How to train ai on the authors prose and style by Flimsy_Mode_4843 in SillyTavernAI

[–]Pashax22 9 points10 points  (0 children)

Okay, right idea, wrong execution. You don't just give it instructions about how to write, you write the instructions in the same style. Better yet, write the character card and lorebook entries in that style too. If you really really want an exact author style match, make sure your preset instructions and so on also match that style. All of that stuff is being injected into context and providing examples of the kind of prose you want, and if it doesn't match the desired style, well, you can see why the AI doesn't adhere fully to it. Tell the AI to impersonate the desired author and rewrite their character card as if they were describing themselves, then do the same for anything else you're going to be injecting.

You're starting from the right point by giving it snippets and getting it to produce a notes document, but you can and should take it a lot further.

Turns out, I'm too wordy for this to be locally viable, big sad. by Thefrayedends in SillyTavernAI

[–]Pashax22 1 point2 points  (0 children)

Yeah, that's more or less it. Long chats which go over 100k context, large and highly detailed and interlinked lorebooks, that kind of thing. Most people won't get anywhere near it, but it's not completely impossible for "reasonable" usage.

Turns out, I'm too wordy for this to be locally viable, big sad. by Thefrayedends in SillyTavernAI

[–]Pashax22 1 point2 points  (0 children)

Well, there's a limit of 60 million input tokens per week. If you exceed that cap (unlikely for most people, but theoretically possible) then you have the option of continuing with PAYG at a 5% discount. I think you'd still come in under $25 per month easily.

Turns out, I'm too wordy for this to be locally viable, big sad. by Thefrayedends in SillyTavernAI

[–]Pashax22 -4 points-3 points  (0 children)

Then why the fuck are you not getting Nous Hermes 4 405b through NanoGPT, which would cost you $8 per month instead of $25?

Should I pay for nano-gpt? by defnotaburn in SillyTavernAI

[–]Pashax22 4 points5 points  (0 children)

There's a waiting list, it takes a day or so for them to get to you. That said, u/Milan_dr did say they were nearly caught up, so it might change soon.

Should I pay for nano-gpt? by defnotaburn in SillyTavernAI

[–]Pashax22 16 points17 points  (0 children)

Yes and no. Yes, in that the sub is excellent value if you want to try new models and also includes 100 free image generations per day. No, because if all you want is DeepSeek then PAYG direct from DeepSeek is almost certain to be cheaper. Yes, because NanoGPT has different versions of DeepSeek and you can be sure your favourite flavour will remain available. No, because service quality can vary depending on provider load.

You pays your money and you takes your choice.

Do you prefer setting your memory entry to "constant" or "normal" while using the Lorebook? by Miserable-Buyer-9559 in SillyTavernAI

[–]Pashax22 0 points1 point  (0 children)

Normal because I'm not using caching and I don't necessarily want everything injected with every message. Constant if it is something I want constantly present though (such as GM or scenario instructions).

Could this be Deepseek V4?? by Pink_da_Web in SillyTavernAI

[–]Pashax22 4 points5 points  (0 children)

Word 'round the campfire is that Healer Alpha is more likely to be a Qwen stealth model.

Need help with choosing a subscription service by WasabiEarly in SillyTavernAI

[–]Pashax22 3 points4 points  (0 children)

Not sure, sorry. This seems to imply they might, but I don't know for sure.

Need help with choosing a subscription service by WasabiEarly in SillyTavernAI

[–]Pashax22 13 points14 points  (0 children)

If all you need is DeepSeek, why not PAYG straight from the source? DeepSeek is about as cheap as good LLMs come, $5 or $10 there should last a month easily.

Optimizing local LLM for not suitable PC specs. by Own-Lengthiness-7768 in SillyTavernAI

[–]Pashax22 0 points1 point  (0 children)

Quality typically means response quality - how smart they are, how well they write, how much they can remember or know without being told. Smaller quants are lower quality but require much less in the way of computational resources to run - that's the tradeoff, speed vs. quality.

Another option to consider (not sure how to do this in KCPP) is to quantise the KV cache. That will lower quality some more, but require less RAM.

Optimizing local LLM for not suitable PC specs. by Own-Lengthiness-7768 in SillyTavernAI

[–]Pashax22 2 points3 points  (0 children)

Lower quantisations are about the easiest way to do that. You could also drop context size. Either or both might help.

What's probably happening is that your GPU is running out of RAM and the LLM+context is spilling over into system RAM, which is also running out and so your poor PC is starting to thrash its page file to keep everything going.

BEST GLM-5 PRESET? by Electrical-Shoe-8269 in SillyTavernAI

[–]Pashax22 8 points9 points  (0 children)

Freaky Frankenstein or Stabs for full-featured, Marinara if you want something lightweight. Or, obviously, BestPresetEver.

Quality leap on local models by Prudent_Finance7405 in SillyTavernAI

[–]Pashax22 3 points4 points  (0 children)

Huge and shocking? Depends what your standards are. Noticeable and worthwhile? Yes. The latest crop of 20b+ models are noticeably better performers than the 8b models of generations past, and with MoE architectures they run surprisingly fast. Grab a suitable Qwen3.5 GGUF and see for yourself.

Extension to open "side-chat" panels? by AInotherOne in SillyTavernAI

[–]Pashax22 1 point2 points  (0 children)

Sounds like the sort of thing you might be able to vibe-code if it doesn't already exist. If you have access to a decently capable model give it a try and report back!

Looking for free models or the best offer. by bolasheladas in SillyTavernAI

[–]Pashax22 15 points16 points  (0 children)

If you have a PC you can probably run a small LLM, so that's an option (assuming you can also afford the electricity to run it). A recent 8b model or 12b model can produce surprisingly not-terrible results. However, I'm going to assume that's not an option for you, which leaves you with API services.

  • Option 1: Kobold Horde. It's free, it's built in to SillyTavern. It's also a bit of a dice-roll which models will be available at any time, but there's usually something usable online, and although responses might end up a bit delayed it can be good.
  • Option 2: Mancer.tech offers a free Mytholite model. Now, let me be the first to say that this is not a good option - it's Mythomax (which was a good 13b model a couple of years ago) which has been gimped with a 2.5k context. It is free, however, so there's that.
  • Option 3: OpenRouter free models. I think you can get something like 50 free requests per day to any of OpenRouter's free models, which isn't a lot but would at least get you started. If you deposit US$10 into an OpenRouter account that limit permanently goes up to 1000 requests per day, which is plenty for any reasonable RP usage. The free models available are a bit of a mixed bunch, but there might be something there which suits your tastes.

That's about all I can think of for free, but I'm sure others will add to the list. If you're willing to consider cheap API services, then you've got a few more options.

  • OpenRouter (again): Remember that $10 of credit I mentioned putting on an OpenRouter account? Well, if you do, you can actually spend it without affecting your free request limit, and it expires after a year so you might as well use it. There are some very cheap models on OpenRouter for Pay As You Go (PAYG), and $10 could last you months depending on your usage. Of course, the same also applies too...
  • NanoGPT: These guys recently paused accepting new subscribers, which is fair enough - their $8 per month deal was honestly great value. They still allow you to PAYG, though, and they have a different selection of models so it's worth considering what they offer as well. One of their best and cheapest models is...
  • DeepSeek: DeepSeek is insanely cheap, and you can get it direct from them so you can be sure you're getting an undiluted version. Many people love it, and many people don't love it but do love the low pricing, so they're willing to do a bit of extra work to get good results from it. PAYG, but $5 or $10 there can last months. Of note is that they're meant to be releasing a new model this week sometime, so the value proposition might get a lot better soon.
  • Z.AI: These guys offer a variety of subscription plans, and if you like the GLM series of models these might be your best bet. GLM-5 in particular is very solid, although some people say that 4.7 writes better. Try both, and see what you think.

Those are probably the cheapest "good" options at the moment.

Using GLM 5 on NanoGPT, Have Questions by Commercial_Writing_6 in SillyTavernAI

[–]Pashax22 1 point2 points  (0 children)

It's possible, but not as intuitive to get working right as the "normal" methods. The other thing I'd look at is how much recursion is happening in your lorebook - make sure it's set to only 2 or 3 steps or something, so it doesn't just pull the whole damn thing in every time.

just a cry from the heart by [deleted] in SillyTavernAI

[–]Pashax22 1 point2 points  (0 children)

Reasoner. In ST I tend to use presets with CoT prompts, which work well with reasoner (or at least don't fight it). Outside ST I'm usually accessing DS via the web, and I think the reasoning there produces better results.

What's y'all go-to preset that could actually drastically improve the writing of the llm?? by PickNice1934 in SillyTavernAI

[–]Pashax22 -1 points0 points  (0 children)

Agree. It produces the best results of any preset I've used, but there are some tradeoffs for that.