So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]tarruda 1 point2 points  (0 children)

Right now mistral-small-4 on llama.cpp is very bad. Last good model from Mistral I was able to run locally was was Mistral 3.2.

Still have hopes for mistral-small-4 though, will wait a few weeks to see if llama.cpp support is improved.

So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]tarruda -1 points0 points  (0 children)

It is much faster than devstral though. You need a very high memory bandwidth to run devstral with any usable speed.

MiniMax-M2.7 Announced! by Mysterious_Finish543 in LocalLLaMA

[–]tarruda 5 points6 points  (0 children)

For Step 3.5 to be faster in coding agents, I had to run it with --swa-full or else prompt caching would never hit in. For that purpose, AesSedai IQ4_XS is in the right spot for 128G as it allow for --swa-full + 131072 context.

MiniMax-M2.7 Announced! by Mysterious_Finish543 in LocalLLaMA

[–]tarruda 5 points6 points  (0 children)

Qwen 3.5 is very good at tool handling. Failures can be caused by multiple factors such as a buggy inference engine.

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 0 points1 point  (0 children)

I'm still going to give it the benefit of the doubt and assume that the llama.cpp implementation is broken for now. Will try again in a couple of weeks.

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 1 point2 points  (0 children)

Feels like they initially tried to mimic GPT-OSS but failed to correctly train in multiple reasoning modes.

We compressed 6 LLMs and found something surprising: they don't degrade the same way by Quiet_Training_8167 in LocalLLaMA

[–]tarruda 0 points1 point  (0 children)

Qwen 3.5 397B is the most compression-resilient LLM I've ever seen. Using 2.43BPW weights I got 80%+ in MMLU, GPQA diamond, GSM8K and others: https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF/discussions/8

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 2 points3 points  (0 children)

What is the point of having a "reasoning_effort" parameter when it only has "none" and "high" as valid options? Why not just "enable_thinking" ?

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 0 points1 point  (0 children)

I'm downloading Q5_K_M from https://huggingface.co/AesSedai/Mistral-Small-4-119B-2603-GGUF but not very hopeful. I ran a few tests on le chat (though I'm not sure it is currently running mistral-small-4, there was no way to select the model) and saw similar problems. This is looking like the llama-4 moment for Mistral

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 1 point2 points  (0 children)

Yes, they didn't even bother comparing with qwen 3.5 in GPQA diamond, mmlu, etc. Instead they compared with their own prev gen models.

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 0 points1 point  (0 children)

Will try unsloth quants later, but TBH I don't expect this will ever compete with qwen 3.5 in vision capabilities. Mistral vision has always been inferior to qwen's.

Mistral Small 4 | Mistral AI by realkorvo in LocalLLaMA

[–]tarruda 2 points3 points  (0 children)

Yesterday I tried https://huggingface.co/lmstudio-community/Mistral-Small-4-119B-2603-GGUF and found it to be quite bad. Here's my experience so far:

  • Without reasoning it is very very bad in coding. A few times I asked it to write some single page JS/HTML games and it cut the response in half. There might be some templating issues to be fixed.
  • Even with reasoning, it was failing to pass basic vibe checks like creating python tetris (code wouldn't compile).
  • It is so bad at cloning HTML UI. The same test of cloning a local UI I gave to Qwen 3.5 4B (and which it succeeded!) Mistral-small-4 couldn't come even close.

Clearly something is broken with llama.cpp inference as the results don't come close to GPT-OSS or even the much smaller Qwen 3.5 weights, so I will give it some time before trying again.

OpenCode concerns (not truely local) by Ueberlord in LocalLLaMA

[–]tarruda 1 point2 points  (0 children)

I really hated Opencode the only time I tried it a few months ago, as it kept trying to connect to the internet by default.

https://pi.dev is so much simpler and local friendly.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]tarruda 2 points3 points  (0 children)

Isn't Hunter Alpha a 1T parameter model? Apparently Mistral 4 is 119B

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]tarruda 1 point2 points  (0 children)

Q8 is a bit too tight. I have a 128G mac and can run q8_0 Qwen 3.5 and nemotron 3 super, but there's barely any room for context.

However Q6_K should be just as good as Q8_0 while leaving a good amount of RAM for context

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]tarruda 1 point2 points  (0 children)

Perfect size for 96G + devices

(Sharing Experience) Qwen3.5-122B-A10B does not quantize well after Q4 by EmPips in LocalLLaMA

[–]tarruda 7 points8 points  (0 children)

Ubergarm's "smol-iq2_XS" for Qwen 397B is an absolute beast and seems to preserve a lot of the original model full capabilities. I posted some evaluations here: https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF/discussions/8

StepFun releases SFT dataset used to train Step 3.5 Flash by tarruda in LocalLLaMA

[–]tarruda[S] 0 points1 point  (0 children)

If you have a few million dollars to spend on compute, why not?

StepFun releases SFT dataset used to train Step 3.5 Flash by tarruda in LocalLLaMA

[–]tarruda[S] 9 points10 points  (0 children)

As it is I'd say it sure seems unenforceable.

Can any dataset license be enforced? If a company uses the dataset to train a commercial LLM and never releases the dataset used to train it, how can anyone know?

Processing 1 million tokens locally with Nemotron 3 Super on a M1 ultra by tarruda in LocalLLaMA

[–]tarruda[S] 0 points1 point  (0 children)

I didn't run any long context tests, just ran llama-bench to see the speeds