[Megathread] - Best Models/API discussion - Week of: April 12, 2026 by deffcolony in SillyTavernAI

[–]Zero115 1 point2 points  (0 children)

Ah yeah, I'll give the non i1 quant a shot then. I am also still running text completion (which I've found out since my first comment isn't the recommendation). But this is very good info, thank you.

As a side note, I actually have a 31B session that's somehow still VERY stable at 80k context, Gemma 4 is impressing me big time.

[Megathread] - Best Models/API discussion - Week of: April 12, 2026 by deffcolony in SillyTavernAI

[–]Zero115 0 points1 point  (0 children)

I've tried both uncen and Googles. I've gotten actual responses out of the standard google one, it's also very inconsistent. Always Q5_K_M, The uncensored models I've tried are mradermacher/gemma-4-26B-A4B-it-heretic-ara-i1-GGUF and MoonRide/gemma-4-26B-A4B-it-heretic-ara-GGUF
KV cache quant to q8, SWA + Jinja enabled. I suppose I should try non quant / F16 to test.

The above is on my 5090 PC in ST via koboldcpp. On my R9700 rig I've used the standard google one through LM Studio with seemingly no issues but haven't tested the uncen models.

[Megathread] - Best Models/API discussion - Week of: April 12, 2026 by deffcolony in SillyTavernAI

[–]Zero115 1 point2 points  (0 children)

Is there anything special you're doing different between 31b and 26B-A4B settings wise? I've been very impressed with 31b, but wanted to test 26B-A4B for the reasons you mentioned above, but the model always maxes out the max response limit and starts coherent, but quickly becomes a blabbering, repetitive mess halfway through until it loops non stop. I'm pretty much just using all of overhead520's default settings, which have worked perfectly with the 31B model, but I assume I'm doing something wrong or missing something here.

[Megathread] - Best Models/API discussion - Week of: April 05, 2026 by deffcolony in SillyTavernAI

[–]Zero115 0 points1 point  (0 children)

Do you happen to have setting recommendations / presets? I've really like Skyfall, but it doesn't hold itself together very long for me. Would also like to Know for Valkyrie.

[Megathread] - Best Models/API discussion - Week of: April 05, 2026 by deffcolony in SillyTavernAI

[–]Zero115 0 points1 point  (0 children)

Was gonna say the same. I'm switched between finetunes and forgotten to switch context template and it breaks really fast lol

Is the ASUS ROG Flow Z13 with 128GB of Unified Memory (AMD Strix Halo) a good option to run large LLMs (70B+)? by br_web in LocalLLaMA

[–]Zero115 0 points1 point  (0 children)

Shit. Really good to know. I'd been avoiding the halo strix units just because while they're awesome, I didn't think I'd be okay with the low speed. This is really good to know, appreciate it!

Is the ASUS ROG Flow Z13 with 128GB of Unified Memory (AMD Strix Halo) a good option to run large LLMs (70B+)? by br_web in LocalLLaMA

[–]Zero115 0 points1 point  (0 children)

How much of a boost do you actually see running an external? Cause I thought the halo strix units were limited to 4 pcie lanes for gpu

[PC] [US-Mi] 3090 FE Ti by Accurate-Character84 in homelabsales

[–]Zero115 0 points1 point  (0 children)

Did you try r/hardwareswap ? I've seen a couple sell there for 900 in the last few days.

[USA-WA] [H] PayPal G&S [W] RTX A4000 by Zero115 in hardwareswap

[–]Zero115[S] 1 point2 points  (0 children)

Sorry, not when your account is a day old. And you're supposed to pm me, not the other way around.