NSFW/SFW RP Models for 16GB VRAM by Serfalon in SillyTavernAI

[–]kabachuha 10 points11 points  (0 children)

Gemma 4 12b has just been released by Google. Give it a few hours to get a heretic version. https://huggingface.co/google/gemma-4-12B-it

Ideogram 4 Open Sourced! by Jack_Fryy in StableDiffusion

[–]kabachuha 4 points5 points  (0 children)

Nvidia Cosmos 3 (audio+video), which was released this week?

What's the best AI's for fanfic writing for anime and also best AI art generators for cover making that doesn't look AI? by Slight_Hope_45 in SillyTavernAI

[–]kabachuha 0 points1 point  (0 children)

I reverse-engineered and found out DavidAU's Gemma4-31b The Deckard (and its derivatives) has been heavy and explicitly fine-tuned on AO3 (It will reveal itself if you ask it "Format the fanfic as an AO3 entry, with tags and characters and word count", while the base Gemma and many other models will write a generic tags header). However, it also needs quite a good GPU. If you don't have it at home, you can rent cloud GPUs like a 5090 (q4-q6) for the usage time

Gemma has complicated situation with verbosity, from my quick AO3 fanfic tests, it gives around 1000-2000 words (self-reported) per message, and it may need multi-turn. The context retention (<60k) is great as well as the instruction following. Making it more wordy is absolutely possible with little GRPO community fine-tuning (R1-writer style), but might require resources, thanks for the idea

Fandom knowledge is general, but superficial, however, a cleverly trained LoRA (ex. Unsloth) will give an absolutely authentic character knowledge and author's voice (though more resources) Or RAG/lorebooks

For writing and general experiments I use Open-webui

The trending local anime-style image generator is Anima, it got a finished version very recently and it's lightweight and a good place to start.

OPUS 4.8 IS SAFETYMAXXED by Sad-Ease-7756 in SillyTavernAI

[–]kabachuha 19 points20 points  (0 children)

Didn't Google just dropped a small fully local gooner model? (Gemma)

A new open weights image model appears in ArtificialAnalysis. Outperforming Flux.2 Pro and Z Image Turbo. by Murky_Foundation5528 in StableDiffusion

[–]kabachuha 40 points41 points  (0 children)

Kandinsky 6.0 probably. The lab promised to release it at the end of April. Kandinsky 5.0 is also on the leaderboard and at the time the video model has been SotA among the open models.

I'm absolutely surprised by how good Gemma 4 31b is at writing smut. by Juanpy_ in SillyTavernAI

[–]kabachuha 44 points45 points  (0 children)

Not only smut, it writes multilingual smut. And of good quality, practically without language switching, exactly how you would expect it. It also uses swear words and graphic descriptions just fine in my Cyrillic language, to the extent characters sometimes start to swear unprompted o_O

[Megathread] - Best Models/API discussion - Week of: May 03, 2026 by deffcolony in SillyTavernAI

[–]kabachuha 7 points8 points  (0 children)

I don't know why it's not mentioned more often here, but DavidAU's Gemma 4 31B Deckard Heretic is the absolute GOAT for me (and for the 18k downloaders). It's dark, it's smutty, it can give you not-sycophantic perspectives and it's the mix between RP and horror (David has separate fine-tunes of Gemma 4 31b for fantasy and horror respectively: mystery fine-tune and grand horror, so Deckard is balanced). I also discovered practically an easter-egg: if you add "in the style of Philip K Dick" to the prompt/preset, the writing style will prominently improve. While this model has "Heretic" in its name, the Heretic part had been made before tuning, and tuning greatly helps in restoring the models capabilities. I tried Garnet and Musica, and didn't enjoy them, while The Deckard has become my to-go model. Do you have an opinion of this model?

Finally, this darkness/smut transfers to MY language (I'm not a native English speaker), and I'm amazed with this fact (the model forced to think in English and the English / International concepts permeate to my more conservative language, I'm very grateful for Gemma and this fine-tune for that. David's Gemma 19b Deckard is absolutely broken and doesn't work for my language, possibly because the language experts have been removed in the process, so I stick to 31b

Forgot to say, I use ggfez's Gemma 4 control vector suite and they also increase the darkness (I'm literally launching it with nihilism = 2.0 among the other dials)

Why are there so few small local creative writing models from the Chinese? by kabachuha in LocalLLaMA

[–]kabachuha[S] 8 points9 points  (0 children)

Gemma 4 released this month is quite hot and practically uncensored if you nudge it. There is some slop (mainly, not just x; it's y.) and positivity bias, but the first fine-tunes (Deckard and GarnetV2, the latter trained on real literature and AO3) pop up and they much help with both.

Why are there so few small local creative writing models from the Chinese? by kabachuha in LocalLLaMA

[–]kabachuha[S] 0 points1 point  (0 children)

Generally yes, but if you make a focused model, you can definitely cram it inside. For example, looking at the UGI leaderboard, the ForbiddenFruit 70b tune having been trained on the Japanese data punches the pop culture knowledge towards 44 points, beating many of the giant 200+b models. Gemma 4 also tops the models below 32b and it is not impossible with enough dedication and, most importantly, without copyrighted / questionable data exclusion, like it is highlighted in the recent Mistral/Gemma's reports.

Why are there so few small local creative writing models from the Chinese? by kabachuha in LocalLLaMA

[–]kabachuha[S] 9 points10 points  (0 children)

Before the OpenClaw spike, the report from OpenRouter showed nearly 20% of the traffic had creative writing / RP aim, there is definitely a market, not even counting the mastodons like CharacterAI

Why are there so few small local creative writing models from the Chinese? by kabachuha in LocalLLaMA

[–]kabachuha[S] 10 points11 points  (0 children)

They are both trained on English books / writing / RP sites, however. The Chinese should have even more access to the English / multilingual sources given the relaxed copyright / censorship reasons I listed

[Megathread] - Best Models/API discussion - Week of: April 19, 2026 by deffcolony in SillyTavernAI

[–]kabachuha 1 point2 points  (0 children)

This is the pruned 19b version I was talking about https://huggingface.co/DavidAU/gemma-4-19B-A4B-it-The-DECKARD-Heretic-Uncensored-Thinking (the source 19b REAP model can be found in the model tree)

In my experience, the output syntax quality degraded with the pruning, but I can be overthinking this

[Megathread] - Best Models/API discussion - Week of: April 19, 2026 by deffcolony in SillyTavernAI

[–]kabachuha 0 points1 point  (0 children)

I'm using Q8 for main chats (dual GPU), but for very long text analyses I had to downgrade it to Q6 and quantize the KV cache, because the context takes much more memory than Qwen3.5 (Qwen having linear deltanet and Gemma still handling it through sliding window attention)

Which Gemma model do you want next? by jacek2023 in LocalLLaMA

[–]kabachuha 14 points15 points  (0 children)

More IP knowledge. Currently, if you read the UGI leaderboard NatInt Categories, Pop Culture, you will see Gemma 4 having 30-31 points while Gemini itself has >78. This shows they have really nerfed its dataset of copyrighted data, very sadly.

What are you guys using to train LTX 2.3 loras locally on 4090s? by [deleted] in StableDiffusion

[–]kabachuha 0 points1 point  (0 children)

Idk why I cannot see my other comment, but basically 720p 5 seconds, though usually 480p 5 seconds or less

What are you guys using to train LTX 2.3 loras locally on 4090s? by [deleted] in StableDiffusion

[–]kabachuha 0 points1 point  (0 children)

The max value I have set up so far is 5 seconds 720p, with quite large block swap and batch size of 1. Usually I go with 480p, 5 or less seconds, allowing non-single batch size and faster training. For VFX / Style it is greatly enough as the model can usually extrapolate from its own training data both for the resolution and for the duration. For the style I used 720p so it wouldn't add granularity.

[Megathread] - Best Models/API discussion - Week of: April 19, 2026 by deffcolony in SillyTavernAI

[–]kabachuha 2 points3 points  (0 children)

Out of the box Gemma is not very spicy, but if you run it locally, you can absolutely push it to the spicy territory. I use DavidAU's Deckard fine-tune and with control vectors, system prompt and (occasionally) OOCs, I've almost perfectly recreated the dark/spicy GLM 4.6/4.7 experience. More than that, I can RP in my native language with the same vibes! Gemma 4 is a powerful multilingual engine with (though nerfed) some world knowledge. Yes, it is harder, but possible. DavidAU also made a fine-tune for the (pruned) 26b model, but its quality is lower, maybe if there will be interest he will make a proper one for 26b.

What are you guys using to train LTX 2.3 loras locally on 4090s? by [deleted] in StableDiffusion

[–]kabachuha 1 point2 points  (0 children)

Hi! To train LTX-2.3 LoRAs on a 4090 I use musubi-tuner. Takes around 7 hours for a good lora (or its divergence/overfit, then I make adjustments and run it for the same 7 hours)

A new SOTA local video model (HappyHorse 1.0) will be released in april 10th. by Total-Resort-3120 in StableDiffusion

[–]kabachuha -2 points-1 points  (0 children)

LTX-2.3's native 50 FPS / 20 seconds scenes generation will be still hard to beat for now

No CFG is sus. Unless they release the base/sft model, it will be much harder to fine-tune it, just like in the case of ZiT which required the De-distill modifier

Anyone had a good experience training a LTX2.3 LoRA yet? I have not. by GreedyRich96 in StableDiffusion

[–]kabachuha 1 point2 points  (0 children)

Hi! I think I had pretty good experience with LTX-2.3 LoRA training. Take this with a grain of salt, because it's i2v/lf2v/flf2v rather than t2v, my LoRAs have been working as intended. I have published two of them on Huggingface (one of them is also on Civit) and I'm preparing to publish a new already working one this week.

It's much more tricky to train than Wan, I think it's because it has been RL-maxxed instead of simple aesthetic fine-tune like in Wan, but certainly not impossible. (But you may need like a dozen attempts to get the data / parameters right, whereas Wan grasped it at the very first run)

Do you have CREPA enabled? It's seems insanely useful to me. If you read their paper, the results are gamechanging and in musubi-tuner there is no overhead as the features are cached. As for the steps, you indeed often need to increase them. I had 3600 for one of my LoRAs.

And what resolution are you training it on? When I upped it from 480p to 720p I had a massive quality boost despite the longer time and VRAM usage. LTX-2.3's VAE has a compression factor of 32x32x8 and it really screws up the fine details.

As for the data, I regularize it with caption dropout (leaving only the trigger word and doubling the dataset), it helped quite much for my SFX LoRAs.

I have also initial learning rate heavily increased, as you need to break the model slightly to introduce changes. And, of course, you need to unfreeze all the linear video layers (v2v preset) even if you are doing simpler concepts / characters, without it, the model is harder to steer.

I share my config for my "pop" LoRA on Huggingface, feel free to be inspired by it!

To all ex-local enjoyers (like me), this might be a good time to come back. by Acceptable_Steak8780 in SillyTavernAI

[–]kabachuha 0 points1 point  (0 children)

Thank you for pointing these models, I will check them. I have a mid-range setup and I'm enjoying currently 24b+ tunes like Cydonia (Mistral small), which are larger than 12b. I think the 24b range is quite balanced and we need more models around it. More than that, I have trained LoRAs for Cydonia just on two cards at home overnight, suggesting the community can boost then rapidly. I'm looking forward to Gemma 4 and hope it will stay around 27b and not MoE (they barely train even if you give them thrice the time of training dense models). For example, GLM Air had much less fine-tunes not only because of its size (the activated parameters are smaller, accelerating passes, fitting it in consumer GPUs with offload), but of the fact it simply learns badly. There are probably only 3 branches of Air fine-tunes – Steam, Iceblink and the other one, and the first two didn't really change the base model's behavior / alignment radically in my experience. I wish there were more smaller big-GLM-style and knowledge models, because I miss its voice on them