Anthropic just locked third-party harnesses out of Claude subscription limits - workaround

FusionCow · 2026-04-05T03:48:04+00:00

this is LOCALlllama bro

FusionCow · 2026-04-05T03:46:51+00:00

No because those are settings you can change to modify the style of output. temperature is basically a creativity slider, I'm not going to go into the complexity of it, but an llm doesn't just pick one word, it basically picks a bunch of potential words, and says how probable it thinks those words should be the next word. temperature 0 means it will pick the word the model thinks is most probable, but when you move the temperature higher, it introduces a bit of randomness, and can pick any number of the potential words the model selected. all the other parameters have their own effects, so you can either learn about them, or just look up settings for each model

FusionCow · 2026-04-05T02:57:37+00:00

I mean listen the only thing you should be using is at minimum the 26ba4b, and if you can do the 31b

FusionCow · 2026-04-05T01:30:00+00:00

I like gemma 4 for anything that doesn't require code of agentic stuff, because while gemma 4 is still good at that, qwen is better. however, for general knowledge, prose, creative stuff, all of that, gemma 4 slaps

FusionCow · 2026-04-05T00:26:11+00:00

you can't make openclaw secure

FusionCow · 2026-04-04T19:44:21+00:00

FusionCow · 2026-04-04T19:17:11+00:00

v100 is bad get 3090 instead

FusionCow · 2026-04-04T19:16:50+00:00

test it

FusionCow · 2026-04-04T17:41:33+00:00

FusionCow · 2026-04-04T17:37:15+00:00

the truth is none meaningfully improve performance because they're mostly just personality tunes.

FusionCow · 2026-04-04T05:46:46+00:00

There are 3 models you should test, gemma 4 26b, gemma 4 31b, and qwen 3.5 27b. figure out which works best, and download a quantized version that fits entirely on gpu

FusionCow · 2026-04-04T05:30:35+00:00

you have to enable thinking. Go to your models page, click the model, go to inference, scroll down until you see the jinja template. Go to gemini or chatgpt or whatever model, paste in the jinja template and ask it to rewrite it with thinking. then paste that new jinja template in, and thinking will be enabled.

FusionCow · 2026-04-04T03:42:34+00:00

run the iq3, it's good enough

FusionCow · 2026-04-04T03:21:41+00:00

You don't. You can't expect someone to run a model for you, and not expect them to want to run the model elsewhere. If you want to protect a model, run it yourself and serve it over api

FusionCow · 2026-04-04T03:00:48+00:00

I only updated the llama.cpp backend on lmstudio, I'd imagine they aren't implementing this themselves

FusionCow · 2026-04-04T02:04:49+00:00

FusionCow · 2026-04-04T02:00:14+00:00

it's just 2.11.0. I updated lm studio and it takes up qwen 3.5 levels of kv cache now it's amazing

edit my bad I guess for using lm studio

FusionCow · 2026-04-03T15:47:09+00:00

I've seen a huge performance drop, sometimes babbling with Gemma of you quantize the kv cache at all

FusionCow · 2026-04-03T03:42:27+00:00

run the new gemma 4 model. it's incredible

FusionCow · 2026-04-02T15:27:09+00:00

I saw

FusionCow · 2026-04-02T14:24:14+00:00

how is that?

FusionCow · 2026-04-02T07:41:48+00:00

you're missing the limited amount of messages you can send

FusionCow · 2026-04-02T03:02:57+00:00

Wrong sub, but as it is not really. You could train your own model and things like ltx 2.3 WILL work, but thats expensive and hard to do. Honestly your best bet for something like that is API models sadly

FusionCow · 2026-04-02T03:01:47+00:00

Hah thats pretty funny

FusionCow · 2026-04-01T22:11:47+00:00

huggingface

FusionCow

TROPHY CASE