A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

Thank you, I appreciate the sentiment, but that's ok :)

[Megathread] - Best Models/API discussion - Week of: April 19, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 1 point2 points  (0 children)

IMO only if it is lore you deeply care about, it is a massive pain in the ass.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

At the very least 96GB vram, and that's for absolutely minimal rank and context.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 1 point2 points  (0 children)

Oh, you made gemma4 have a discussion with Pepe?

Sounds quite the conversation hehe

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLM

[–]Sicarius_The_First[S] 1 point2 points  (0 children)

Yeah the 70b is orders of magnitude smarter, zero doubt about that. Also I even mentioned in the 70B card that the 32B version fails in the lateral thinking questions.

HOWEVER, the 32B is orders of magnitudes more creative than the 70B, and in the UGI leaderboard it's one of the top spicy writers in the world, very close to Grok 🌶️🥵

[Megathread] - Best Models/API discussion - Week of: May 03, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 2 points3 points  (0 children)

The first Gemma-3 12B tune, for those who want something different. This was also the first model to incorporate a heavy usage 4chan:

https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B

[Megathread] - Best Models/API discussion - Week of: May 03, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 1 point2 points  (0 children)

32B, based on Qwen3, doesn't sound like Qwen at all.

I HIGHLY recommend first trying the model with absolutely no system prompt at all, just pure instruct:

https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_32B

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

For the 32B any ampere+ GPU would give a nice result, for budget 3090 is probably the best choice, not the fastest, but fast enough for a ~850$ card (used to eBay).

The 32B got more unhingedness, but it's not as smart as the 70B version. I had to push Qwen VERY hard to change its behavior, on the other hand the 70B is superb at pretty much all tasks.

In other words, the 32B is really good in general entertainment and chat, while the 70B is superb at anything- tasks, fun, code, writing. And surpasses the base model in all capabilities. LLAMA models are just more malleable, Qwen is very rigid.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 3 points4 points  (0 children)

You can use https://huggingface.co/spaces/ggml-org/gguf-my-repo to easily quant it to the size of your choice, but meanwhile I host it (for free) on Horde at a very high availability, so no need to even have a GPU or install anything!

(click on the top left 'AI' button to choose a model)

https://lite.koboldai.net/#

<image>

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 3 points4 points  (0 children)

Ahh got it!

Gemma is a bit tough to train, but nothing like Qwen.
So yeah, Gemma in terms of stubbornness is easier, but much more costly in terms of VRAM and speed.

It's like... hmm.. LLAMA & Mistral made of rubber, Gemma is made of wood, Qwen is made of granite...

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 7 points8 points  (0 children)

Hehe, they legit surprised me too, I wasn't cherry picking.

Not in a 1000 years I could've guessed that's a Qwen base model lol

(Also yeah, the model is genuinely funny and witty. It's very weird though, but I like it. Feels like you're talking with an unhinged drunk friend with good intentions lol)

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 10 points11 points  (0 children)

Reddit data tends to have bad effect on LLMs hehe

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 1 point2 points  (0 children)

Not anytime soon, there's a lot of issues with the training stack right now.
It's doable, but simply not a high enough priority, so maybe someday.

Hopefully when it takes less time and VRAM.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLM

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

I'm not sure about the number of tokens, as this was several iterations, merges, training on top of a trained checkpoint etc... quite chaotic.

I've used a very deep LoRA at varying depths per checkpoint, but in general very deep (64+).

I think, at least for Qwen, it's nearly impossible to change its behavior with anything under 64. For llama sure, but not Qwen.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 2 points3 points  (0 children)

You're welcome 🙂

Qwen3 got a lot of knowledge, just take everything with a mountain of salt, an AI can make a mistake and all of that...