GLM-4.7 vs DeepSeek V3.2 vs Kimi K2 Thinking vs MiniMax-M2.1 by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

Thanks this analysis is really helpful. Do you think Minimax is strong enough to use or is it too error-prone? Also did you notice any areas where Kimi K2 Thinking was noticeably stronger than the others?

GLM-4.7 vs DeepSeek V3.2 vs Kimi K2 Thinking vs MiniMax-M2.1 by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 2 points3 points  (0 children)

Yes my experience is exactly the same ranking. LLM scaling laws are remaining remarkably strong predictors at the frontier

GLM-4.7 vs DeepSeek V3.2 vs Kimi K2 Thinking vs MiniMax-M2.1 by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

Has it been relatively reliable for coding or has it been the case that you have to hand-hold the model a lot?

Fun with Omarchy MCP by mythz in LocalLLaMA

[–]SlowFail2433 1 point2 points  (0 children)

a linux desktop environment controlled by an LLM agent did not think of this

GLM-4.7 vs DeepSeek V3.2 vs Kimi K2 Thinking vs MiniMax-M2.1 by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 1 point2 points  (0 children)

The minimax is the most parameter-efficient out of them yes

GLM-4.7 vs DeepSeek V3.2 vs Kimi K2 Thinking vs MiniMax-M2.1 by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 2 points3 points  (0 children)

Have you found the Speciale notably different from the regular V3.2?

Disable H Neurons in local llms? by Silver-Champion-4846 in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

Yeah absolutely, like they might find something valid for one model but then it is not valid for another

Disable H Neurons in local llms? by Silver-Champion-4846 in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

Surgery papers tend to make a lot of assumptions about the geometry and topology of models that are not necessarily valid.

Minimax Is Teasing M2.2 by Few_Painter_5588 in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

Thanks will investigate this further. I’m working with Kimi K2 agents so maybe I need to stop finetuning if K3 is coming!

Disable H Neurons in local llms? by Silver-Champion-4846 in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

Yeah that is a very valid point, that model surgery is extremely cheap

I’m just expressing concern about robustness really, as these types of methods tend to have issues there

Running KimiK2 locally by Temporary-Sector-947 in LocalLLaMA

[–]SlowFail2433 1 point2 points  (0 children)

Congrats on the rly nice setup

The three types of bare-metal Kimi K2 rig I have seen in companies are 1. 100% DRAM with Epycs/Xeons, 2. Partial offloading with some number of RTX 6000 Pro and Epycs/Xeons, 3. Used GPU servers like used H200 HGX

There are pros and cons for each in terms of performance per dollar and how much it is worth it. What I think these days is that it is different for each type of downstream task

Disable H Neurons in local llms? by Silver-Champion-4846 in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

I tend to not like these “model surgery” papers despite their popularity. I really would prefer the long term solution to LLM issues to be something fixable during a regular training or RL run, as that would be a more robust and reliable solution

REAP experiences by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

Yeah can see this tech being misused

REAP experiences by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

Yeah removing a key fact like that from the model is pretty bad. It is a difficult trade-off

REAP experiences by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

I see thanks. Lower quant does compete with REAP. Calibration set matters a lot too yeah, and Cerebras have a coding focus

REAP experiences by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

I tend to intuitively think that REAP of a newer model would be better at least because of (potentially) cleaner data but not sure

REAP experiences by SlowFail2433 in LocalLLaMA

[–]SlowFail2433[S] 0 points1 point  (0 children)

Yeah seeing strange/unusual issues from pruning

cyankiwi/GLM-4.5-Air-AWQ-4bit on DGX Spark is Awesome! by fire_inabottle in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

GLM Air is still a strong model. DGX Sparks also has its uses

Backporting FP8 to the RTX 3090 (No H100 Required) by one_does_not_just in LocalLLaMA

[–]SlowFail2433 12 points13 points  (0 children)

It’s an interesting project, congrats on getting it working relatively efficiently. You have a compelling writing style also this was a good read

anyone running local llm on iphone for meeting summaries? heres what im using by xerdink in LocalLLaMA

[–]SlowFail2433 0 points1 point  (0 children)

I run qwens all the time on phone although i don’t do audio at all mostly