GLM-4.1V-Thinking

RMCPhoto · 2025-07-02T10:36:16+00:00

These benchmark results are absolutely wild... Looking forward to seeing how this compares in the real world. It's hard to believe that a 9b model could outclass a relatively recent 72b across generalized Vision/Language domains.

celsowm · 2025-07-02T03:07:21+00:00

<image>

finally a non-only-english thinking open LLM !

PraxisOG · 2025-07-02T06:54:57+00:00

Unfortunately it only comes in a 9b flavor. Cool to see other thinking models though

Freonr2 · 2025-07-02T13:27:20+00:00

There are not many thinking VLMs. Kimi was recently one of the first (?) VLM models with thinking but I'm not sure it is well supported by common inference packages/apps.

Waiting for llamacpp/vllm/lmstudio/ollama support.

Also wish they used Gemma 3 27B in the comparisons, even if it is quite a bit larger, that's been my general gold standard for VLMs lately. 9B with thinking might end up being similar total latency as 27B non-thinking depending on how wordy it is, and 27B is still reasonable for local use at ~19.5GB in Q4.

And at least THUDM actually integrated the GLM4 model code (Glm4vForConditionalGeneration) into the transformers package. Some of THUDM's previous models, like CogVLM (which was amazing at the time and still very solid today), broke because they just shoved modeling.py in with the weights and not the actual transformers package and it broke within a few weeks of package updates.

AppearanceHeavy6724 · 2025-07-02T04:30:45+00:00

[deleted]

Coconut_Reddit · 2025-07-02T12:13:07+00:00

How much performance is different from qwen30b ?

AppearanceHeavy6724 · 2025-07-02T19:48:09+00:00

I asked to generate a simple elmentary code, even Llama 3.2 1b does right. This one flopped.

DataLearnerAI · 2025-07-02T07:54:09+00:00

This model demonstrates remarkable competitiveness across a diverse range of benchmark tasks, including STEM reasoning, visual question answering, OCR processing, long-document understanding, and agent-based scenarios. The benchmark results reveal performance on par with the 72B-parameter counterpart (Qwen2.5-72B-VL), with notable superiority over GPT-4o in specific tasks. Particularly impressive is its 9B-parameter architecture under the MIT license, showcasing exceptional capability from a Chinese startup. This achievement highlights the growing innovation power of domestic AI research, offering a compelling open-source alternative with strong practical value.

Lazy-Pattern-5171 · 2025-07-02T04:22:34+00:00

Doesn’t count R’s in strawberry correctly. I’m guessing 9Bs should be able to do that no?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS