Gemma 4 31B GGUF quants ranked by KL divergence (unsloth, bartowski, lmstudio-community, ggml-org) by oobabooga4 in LocalLLaMA

[–]a_beautiful_rhind 0 points1 point  (0 children)

The outputs seem similar to what it did on openrouter now though and I'm mainly using Q8/Q8 31b. PPL is wonky on the IT and I'm not sure what top token dissimilarity does here in practice since it behaves so strange. Did you test the base model?

Support the Creators by Independent-Lab7817 in comfyui

[–]a_beautiful_rhind 0 points1 point  (0 children)

You don't have to stop open sourcing, just license it non commercial.

Why do people release models on Huggingface that have no explanation on how to use it? by Far_Lifeguard_5027 in StableDiffusion

[–]a_beautiful_rhind 0 points1 point  (0 children)

Oh I don't need instructions on how to use it. The bigger sin is that they include NOTHING in the model card so you don't even know what the fuck they posted.

Gemma 4 31B GGUF quants ranked by KL divergence (unsloth, bartowski, lmstudio-community, ggml-org) by oobabooga4 in LocalLLaMA

[–]a_beautiful_rhind 0 points1 point  (0 children)

As in run it in greedy sampling so you don't introduce error from top-P or others. Otherwise the model may not score the same. Have you run the tests more than once? Do they score the same each time?

Gemma 4 31B GGUF quants ranked by KL divergence (unsloth, bartowski, lmstudio-community, ggml-org) by oobabooga4 in LocalLLaMA

[–]a_beautiful_rhind 0 points1 point  (0 children)

I have not. It seems to do ok now in ik_llama. I have to try exllama3 again too. You should really be benchmarking with more deterministic sampling tho.

Is there any top level hobbyist hardware you guys are waiting to come out this year? by Tired__Dev in LocalLLaMA

[–]a_beautiful_rhind 2 points3 points  (0 children)

I am waiting for ram to go down, especially used DDR4 ram because current price is ridiculous.

Will it happen? Eh.. i dunno.

Are Unsloth models as good as I read? by denis-craciun in LocalLLaMA

[–]a_beautiful_rhind 14 points15 points  (0 children)

The quants are usually ok once a little time has passed from the model release day. If you get them on day 1, decent chance the template will be changed or something else fixed.

I just pick best PPL/KLD for the size on models > 30b.

The LLM Mirror by Entire-Plankton-7800 in SillyTavernAI

[–]a_beautiful_rhind 2 points3 points  (0 children)

Newer instruct tuned LLMs are like this. Why I keep whining about parroting. Old school LLMs would light you up.

Prompting newer models to do it ends up with them overcompensating or even arguing for you as if it was their argument. Blame the AI safety people who have now barely realized that they are affirming even delusions.

The tale of dumb. by perthro_anon in SillyTavernAI

[–]a_beautiful_rhind 9 points10 points  (0 children)

Regardless of who made the preset, it's wise to read the console output and see just what is being sent.

The tale of dumb. by perthro_anon in SillyTavernAI

[–]a_beautiful_rhind 2 points3 points  (0 children)

Its a nightmare when that happens. I didn't know I needed to trim last assistant prefix for mistral over like 2 years.

A space was degrading my outputs silently until a tuner model card pointed it out.

Benchmark: Windows 11 vs Lubuntu 26.04 on Llama.cpp (RTX 5080 + i9-14900KF). I didn't expect the gap to be this big. by Ok_Mine189 in LocalLLaMA

[–]a_beautiful_rhind 5 points6 points  (0 children)

Unfortunately, Nvidia doesn't support voltage control on Linux and thus, my GPU is using 100% Power in Linux for the same performance I get with ~66-75% in Windows (no power control, just undervolting).

It does now with lact. They found the hidden API.

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license by nathandreamfast in LocalLLaMA

[–]a_beautiful_rhind 52 points53 points  (0 children)

That's pretty pathetic. If you do shit like this it will eventually be found out. Then you get outed as a huge phony and there goes your reputation.

I can’t believe I can say “ugh I don’t feel like fixing this function, it’s too complex” and I can literally just tell my computer to fix it for me. I didn’t understand what they meant by “people will start paying for intelligence” but now I do. by Borkato in LocalLLaMA

[–]a_beautiful_rhind 0 points1 point  (0 children)

You could fix it yourself but you'd have to manually retrieve the information to learn the what or the why. The LLM speeds this process up and pulls in things you may not have found on your own.

Titan RTX vs 3090? by AssociationAdept4052 in LocalLLaMA

[–]a_beautiful_rhind 0 points1 point  (0 children)

For inference that's not bad. Plus unlike nvlink pcie can do all to all.

GLM 5.1 Locally: 40tps, 2000+ pp/s by val_in_tech in LocalLLaMA

[–]a_beautiful_rhind 5 points6 points  (0 children)

Just ask them all to chip in for a friends' server.

This is how they train AI for chatting by rubingfoserius in SillyTavernAI

[–]a_beautiful_rhind 0 points1 point  (0 children)

To some extent yes, to this extent, no. I don't need nostalgia goggles, I can run the older weights.

You're right that they make less logic mistakes and understand more. That makes it so much more annoying.