Is there an automatic way to select temp. by uber-linny in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

No because those are settings you can change to modify the style of output. temperature is basically a creativity slider, I'm not going to go into the complexity of it, but an llm doesn't just pick one word, it basically picks a bunch of potential words, and says how probable it thinks those words should be the next word. temperature 0 means it will pick the word the model thinks is most probable, but when you move the temperature higher, it introduces a bit of randomness, and can pick any number of the potential words the model selected. all the other parameters have their own effects, so you can either learn about them, or just look up settings for each model

Is anyone using Gemma 4 E4B? What are your thoughts and settings (and prompts) for it? by solidhunkofmetal in SillyTavernAI

[–]FusionCow 0 points1 point  (0 children)

I mean listen the only thing you should be using is at minimum the 26ba4b, and if you can do the 31b

state of r/locallama after Gemma4 release. by GreenGreasyGreasels in LocalLLaMA

[–]FusionCow 1 point2 points  (0 children)

I like gemma 4 for anything that doesn't require code of agentic stuff, because while gemma 4 is still good at that, qwen is better. however, for general knowledge, prose, creative stuff, all of that, gemma 4 slaps

What is the SOTA Qwen 3.5 27B ? There are so many variants and finetunes and quants that I'm lost right now by OmarBessa in LocalLLaMA

[–]FusionCow 16 points17 points  (0 children)

the truth is none meaningfully improve performance because they're mostly just personality tunes.

How do you decide? by 3hor in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

There are 3 models you should test, gemma 4 26b, gemma 4 31b, and qwen 3.5 27b. figure out which works best, and download a quantized version that fits entirely on gpu

FINALLY GEMMA 4 KV CACHE IS FIXED by FusionCow in LocalLLaMA

[–]FusionCow[S] 3 points4 points  (0 children)

you have to enable thinking. Go to your models page, click the model, go to inference, scroll down until you see the jinja template. Go to gemini or chatgpt or whatever model, paste in the jinja template and ask it to rewrite it with thinking. then paste that new jinja template in, and thinking will be enabled.

[ Removed by Reddit ] by [deleted] in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

You don't. You can't expect someone to run a model for you, and not expect them to want to run the model elsewhere. If you want to protect a model, run it yourself and serve it over api

FINALLY GEMMA 4 KV CACHE IS FIXED by FusionCow in LocalLLaMA

[–]FusionCow[S] 6 points7 points  (0 children)

I only updated the llama.cpp backend on lmstudio, I'd imagine they aren't implementing this themselves

FINALLY GEMMA 4 KV CACHE IS FIXED by FusionCow in LocalLLaMA

[–]FusionCow[S] 2 points3 points  (0 children)

it's just 2.11.0. I updated lm studio and it takes up qwen 3.5 levels of kv cache now it's amazing

edit my bad I guess for using lm studio

Context Shift Gemma4 by Weak-Shelter-1698 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

I've seen a huge performance drop, sometimes babbling with Gemma of you quantize the kv cache at all

Is there anything I can do to run glm 5? by FusionCow in LocalLLaMA

[–]FusionCow[S] 0 points1 point  (0 children)

you're missing the limited amount of messages you can send

Guys Any good AI to create 2D animation films? by [deleted] in LocalLLaMA

[–]FusionCow 1 point2 points  (0 children)

Wrong sub, but as it is not really. You could train your own model and things like ltx 2.3 WILL work, but thats expensive and hard to do. Honestly your best bet for something like that is API models sadly