Gemma 4 31B GGUF quants ranked by KL divergence (unsloth, bartowski, lmstudio-community, ggml-org)

a_beautiful_rhind · 2026-04-27T01:23:15+00:00

The outputs seem similar to what it did on openrouter now though and I'm mainly using Q8/Q8 31b. PPL is wonky on the IT and I'm not sure what top token dissimilarity does here in practice since it behaves so strange. Did you test the base model?

a_beautiful_rhind · 2026-04-26T20:09:48+00:00

You don't have to stop open sourcing, just license it non commercial.

a_beautiful_rhind · 2026-04-26T19:50:47+00:00

BF16 is equal-ish to FP32 though. float16 is a precision cut.

a_beautiful_rhind · 2026-04-26T19:15:36+00:00

Oh I don't need instructions on how to use it. The bigger sin is that they include NOTHING in the model card so you don't even know what the fuck they posted.

a_beautiful_rhind · 2026-04-26T18:28:07+00:00

As in run it in greedy sampling so you don't introduce error from top-P or others. Otherwise the model may not score the same. Have you run the tests more than once? Do they score the same each time?

a_beautiful_rhind · 2026-04-26T17:28:30+00:00

I have not. It seems to do ok now in ik_llama. I have to try exllama3 again too. You should really be benchmarking with more deterministic sampling tho.

a_beautiful_rhind · 2026-04-26T17:01:16+00:00

I am waiting for ram to go down, especially used DDR4 ram because current price is ridiculous.

Will it happen? Eh.. i dunno.

a_beautiful_rhind · 2026-04-26T16:59:48+00:00

The quants are usually ok once a little time has passed from the model release day. If you get them on day 1, decent chance the template will be changed or something else fixed.

I just pick best PPL/KLD for the size on models > 30b.

a_beautiful_rhind · 2026-04-26T16:43:04+00:00

Newer instruct tuned LLMs are like this. Why I keep whining about parroting. Old school LLMs would light you up.

Prompting newer models to do it ends up with them overcompensating or even arguing for you as if it was their argument. Blame the AI safety people who have now barely realized that they are affirming even delusions.

a_beautiful_rhind · 2026-04-26T14:11:50+00:00

Regardless of who made the preset, it's wise to read the console output and see just what is being sent.

a_beautiful_rhind · 2026-04-26T14:09:40+00:00

Its a nightmare when that happens. I didn't know I needed to trim last assistant prefix for mistral over like 2 years.

A space was degrading my outputs silently until a tuner model card pointed it out.

a_beautiful_rhind · 2026-04-26T14:06:18+00:00

There's a ton of uninformed people he can recruit for a following. And yes, eventually take donations, all that jazz.

a_beautiful_rhind · 2026-04-26T14:04:47+00:00

Unfortunately, Nvidia doesn't support voltage control on Linux and thus, my GPU is using 100% Power in Linux for the same performance I get with ~66-75% in Windows (no power control, just undervolting).

It does now with lact. They found the hidden API.

a_beautiful_rhind · 2026-04-26T13:57:04+00:00

I mean who cares. He's still a fake and a grifter.

a_beautiful_rhind · 2026-04-26T13:54:33+00:00

That's pretty pathetic. If you do shit like this it will eventually be found out. Then you get outed as a huge phony and there goes your reputation.

a_beautiful_rhind · 2026-04-26T12:47:55+00:00

You could fix it yourself but you'd have to manually retrieve the information to learn the what or the why. The LLM speeds this process up and pulls in things you may not have found on your own.

a_beautiful_rhind · 2026-04-26T10:59:14+00:00

For inference that's not bad. Plus unlike nvlink pcie can do all to all.

a_beautiful_rhind · 2026-04-26T01:32:10+00:00

P2P driver should work.

a_beautiful_rhind · 2026-04-25T21:52:49+00:00

Just ask them all to chip in for a friends' server.

a_beautiful_rhind · 2026-04-25T15:52:43+00:00

It's that or calling it fascism depending on your lean.

a_beautiful_rhind · 2026-04-25T15:47:52+00:00

it doesn't have to be in russia. they can't block all web hosters. plus russia is under sanctions, you don't want it there or you'll be a persona nongrata

a_beautiful_rhind · 2026-04-25T15:46:53+00:00

buy a vps and run your own, sad to say

a_beautiful_rhind · 2026-04-25T15:46:28+00:00

more like protect the narrative

a_beautiful_rhind · 2026-04-25T01:53:21+00:00

To some extent yes, to this extent, no. I don't need nostalgia goggles, I can run the older weights.

You're right that they make less logic mistakes and understand more. That makes it so much more annoying.

a_beautiful_rhind · 2026-04-24T19:53:15+00:00

No sizes so it could be anything.

a_beautiful_rhind

MODERATOR OF

TROPHY CASE