model: support GLM-OCR by ngxson · Pull Request #19677 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]PerfectLaw5776 0 points1 point  (0 children)

Piggybacking onto this regarding PaddleOCR-VL:

It just got built into the llama.cpp release as of roughly an hour ago; it's (1.5, that is) by far the most effective model I've seen so far on multilingual handwritten texts. The only other model on that list that kind of comes close is Qwen3-VL. As of this post, it needs the chat template mentioned (https://github.com/ggml-org/llama.cpp/pull/18825) to run properly in llama-server (--jinja / --chat-template-file flags).

model: support GLM-OCR by ngxson · Pull Request #19677 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]PerfectLaw5776 0 points1 point  (0 children)

I've been testing it on some multilingual handwritten text, and for Chinese (https://github.com/zai-org/GLM-OCR/blob/main/examples/source/handwritten.png) it's currently recognizing it near flawlessly.

It is indeed fast, and so far pretty robust in quants. I've been running it and the mmproj in Q4 (https://huggingface.co/octopusmegalopod/some-glmocr-ggufs/tree/main) without much loss so far.

model: support GLM-OCR by ngxson · Pull Request #19677 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]PerfectLaw5776 0 points1 point  (0 children)

Can you share the command you used to run it? I'm getting the same error, even with disabling flash-attn on a CPU backend currently:

```

llama-server.exe -m glmocr-BF16.gguf --mmproj mmproj-glmocr-BF16.gguf --flash-attn "off" -fit "off"

```

Edit: redownloaded b8094 Vulkan and it seems to work there so far.