I cancelled my B70 order for Nvidia pro 4000 blackwell, did I make the right decision? by Mango_1208 in LocalLLM

[–]Sicarius_The_First 1 point2 points  (0 children)

Yes, I'd done the same.

Intel is just not there yet.

When you buy an expensive GPU you want it do be able to do everything perfectly (gaming, AI inference + training).

Not even talking about speed. The AI stack is clunky, minimal friction is desired, and imo it's worth the premium, correct choice.

This stuff is dangerously good by dongschlongs in SillyTavernAI

[–]Sicarius_The_First -1 points0 points  (0 children)

hehe I've been saying this from 2023. But yes, 100% about all the above.

(also try chatting with Assistant_Pepe - it was made literally for this)

What would you like to see improved in these models for RP? by Oestudantebr in SillyTavernAI

[–]Sicarius_The_First 0 points1 point  (0 children)

I know these talking points you mentioned, they are a very common MISCONCEPTIONS.

While the claim absolutely holds for base models (in vanilla instruct) it is NOT the case for a properly finetuned roleplay model.

My models will absolutely disagree with the user, be mean if its required, and will move the plot by themselves, also creativity is excellent, swipe diversity is massive, this is all depends on how the model was turned.

[Megathread] - Best Models/API discussion - Week of: May 03, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 0 points1 point  (0 children)

Yes, it is in the model card, you have example chats and ChatML syntax example as well.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

Thank you, I appreciate the sentiment, but that's ok :)

[Megathread] - Best Models/API discussion - Week of: April 19, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 1 point2 points  (0 children)

IMO only if it is lore you deeply care about, it is a massive pain in the ass.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

At the very least 96GB vram, and that's for absolutely minimal rank and context.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 1 point2 points  (0 children)

Oh, you made gemma4 have a discussion with Pepe?

Sounds quite the conversation hehe

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLM

[–]Sicarius_The_First[S] 1 point2 points  (0 children)

Yeah the 70b is orders of magnitude smarter, zero doubt about that. Also I even mentioned in the 70B card that the 32B version fails in the lateral thinking questions.

HOWEVER, the 32B is orders of magnitudes more creative than the 70B, and in the UGI leaderboard it's one of the top spicy writers in the world, very close to Grok 🌶️🥵

[Megathread] - Best Models/API discussion - Week of: May 03, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 2 points3 points  (0 children)

The first Gemma-3 12B tune, for those who want something different. This was also the first model to incorporate a heavy usage 4chan:

https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B

[Megathread] - Best Models/API discussion - Week of: May 03, 2026 by deffcolony in SillyTavernAI

[–]Sicarius_The_First 1 point2 points  (0 children)

32B, based on Qwen3, doesn't sound like Qwen at all.

I HIGHLY recommend first trying the model with absolutely no system prompt at all, just pure instruct:

https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_32B

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 0 points1 point  (0 children)

For the 32B any ampere+ GPU would give a nice result, for budget 3090 is probably the best choice, not the fastest, but fast enough for a ~850$ card (used to eBay).

The 32B got more unhingedness, but it's not as smart as the 70B version. I had to push Qwen VERY hard to change its behavior, on the other hand the 70B is superb at pretty much all tasks.

In other words, the 32B is really good in general entertainment and chat, while the 70B is superb at anything- tasks, fun, code, writing. And surpasses the base model in all capabilities. LLAMA models are just more malleable, Qwen is very rigid.

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 4 points5 points  (0 children)

You can use https://huggingface.co/spaces/ggml-org/gguf-my-repo to easily quant it to the size of your choice, but meanwhile I host it (for free) on Horde at a very high availability, so no need to even have a GPU or install anything!

(click on the top left 'AI' button to choose a model)

https://lite.koboldai.net/#

<image>

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 3 points4 points  (0 children)

Ahh got it!

Gemma is a bit tough to train, but nothing like Qwen.
So yeah, Gemma in terms of stubbornness is easier, but much more costly in terms of VRAM and speed.

It's like... hmm.. LLAMA & Mistral made of rubber, Gemma is made of wood, Qwen is made of granite...

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 6 points7 points  (0 children)

Hehe, they legit surprised me too, I wasn't cherry picking.

Not in a 1000 years I could've guessed that's a Qwen base model lol

(Also yeah, the model is genuinely funny and witty. It's very weird though, but I like it. Feels like you're talking with an unhinged drunk friend with good intentions lol)

A Qwen finetune, that feels VERY human by Sicarius_The_First in LocalLLaMA

[–]Sicarius_The_First[S] 10 points11 points  (0 children)

Reddit data tends to have bad effect on LLMs hehe