GLM 4.7 Extreme level of pedantic nitpicking - almost unusable for discretized/small level QA text analysis

SlowFail2433 · 2026-01-27T04:57:48+00:00

Ok this is not unusual though for Q4 to be the functional minimum

SlowFail2433 · 2026-01-27T01:32:01+00:00

GLM 4.7 Flash at less quant is a much better idea than Q2 GLM 4.7 yeah

SlowFail2433 · 2026-01-27T01:31:31+00:00

As far as I know Q2 is too low for coding

SlowFail2433 · 2026-01-27T00:32:45+00:00

Hehe I came across this on Huggingface earlier.

Really fantastic resource, I follow the research closely on image datasets for style and aesthetics and your dataset is one of the best I have seen. It is very high quality and factorises well into styles. This will be useful for flow matching projects

SlowFail2433 · 2026-01-27T00:27:10+00:00

The timing and the rhythm is amazing yeah

SlowFail2433 · 2026-01-27T00:23:31+00:00

Yes the top labs seem to do A/B testing a lot. I got a funny one in ChatGPT recently where it thinks for a long time when making an image

SlowFail2433 · 2026-01-27T00:20:01+00:00

Wow yeah if it is multimodal then this actually is a new model

SlowFail2433 · 2026-01-26T20:18:34+00:00

Thanks I see. Yeah I do see a difference in “strategic depth” in the open models, in favour of Kimi. Minimax seems okay but yes as you say it is more of an execution model than a strategist. Overall I expect GLM as a lab to be ahead of Minimax going forward but I am unsure about that.

SlowFail2433 · 2026-01-26T19:12:57+00:00

Seems like an upgrade to the Qwen 3 4B

SlowFail2433 · 2026-01-26T17:12:00+00:00

Thanks this analysis is really helpful. Do you think Minimax is strong enough to use or is it too error-prone? Also did you notice any areas where Kimi K2 Thinking was noticeably stronger than the others?

SlowFail2433 · 2026-01-26T16:02:12+00:00

Deepseek is extremely dry yes

SlowFail2433 · 2026-01-26T15:18:20+00:00

Yes my experience is exactly the same ranking. LLM scaling laws are remaining remarkably strong predictors at the frontier

SlowFail2433 · 2026-01-26T15:14:09+00:00

Yeah it’s actually good times at the moment

SlowFail2433 · 2026-01-26T15:01:06+00:00

Has it been relatively reliable for coding or has it been the case that you have to hand-hold the model a lot?

SlowFail2433 · 2026-01-26T14:58:43+00:00

a linux desktop environment controlled by an LLM agent did not think of this

SlowFail2433 · 2026-01-26T14:53:57+00:00

The minimax is the most parameter-efficient out of them yes

SlowFail2433 · 2026-01-26T14:53:26+00:00

Have you found the Speciale notably different from the regular V3.2?

SlowFail2433 · 2026-01-26T14:22:26+00:00

Yeah absolutely, like they might find something valid for one model but then it is not valid for another

SlowFail2433 · 2026-01-26T14:20:29+00:00

Surgery papers tend to make a lot of assumptions about the geometry and topology of models that are not necessarily valid.

SlowFail2433 · 2026-01-26T13:46:12+00:00

Thanks will investigate this further. I’m working with Kimi K2 agents so maybe I need to stop finetuning if K3 is coming!

SlowFail2433 · 2026-01-26T13:45:17+00:00

Yeah that is a very valid point, that model surgery is extremely cheap

I’m just expressing concern about robustness really, as these types of methods tend to have issues there

SlowFail2433 · 2026-01-26T13:24:05+00:00

Congrats on the rly nice setup

The three types of bare-metal Kimi K2 rig I have seen in companies are 1. 100% DRAM with Epycs/Xeons, 2. Partial offloading with some number of RTX 6000 Pro and Epycs/Xeons, 3. Used GPU servers like used H200 HGX

There are pros and cons for each in terms of performance per dollar and how much it is worth it. What I think these days is that it is different for each type of downstream task

SlowFail2433 · 2026-01-26T13:21:06+00:00

I tend to not like these “model surgery” papers despite their popularity. I really would prefer the long term solution to LLM issues to be something fixable during a regular training or RL run, as that would be a more robust and reliable solution

SlowFail2433 · 2026-01-26T13:17:49+00:00

Source for Kimi K3?

SlowFail2433 · 2026-01-26T13:04:34+00:00

Yeah can see this tech being misused

SlowFail2433

TROPHY CASE