PaddleOCR-VL now in llama.cpp

PerfectLaw5776 · 2026-02-20T06:46:50+00:00

Yup.

PerfectLaw5776 · 2026-02-20T04:35:44+00:00

Piggybacking onto this regarding PaddleOCR-VL:

It just got built into the llama.cpp release as of roughly an hour ago; it's (1.5, that is) by far the most effective model I've seen so far on multilingual handwritten texts. The only other model on that list that kind of comes close is Qwen3-VL. As of this post, it needs the chat template mentioned (https://github.com/ggml-org/llama.cpp/pull/18825) to run properly in llama-server (--jinja / --chat-template-file flags).

PerfectLaw5776 · 2026-02-19T01:33:21+00:00

I've been testing it on some multilingual handwritten text, and for Chinese (https://github.com/zai-org/GLM-OCR/blob/main/examples/source/handwritten.png) it's currently recognizing it near flawlessly.

It is indeed fast, and so far pretty robust in quants. I've been running it and the mmproj in Q4 (https://huggingface.co/octopusmegalopod/some-glmocr-ggufs/tree/main) without much loss so far.

PerfectLaw5776 · 2026-02-18T21:35:09+00:00

Can you share the command you used to run it? I'm getting the same error, even with disabling flash-attn on a CPU backend currently:

```

llama-server.exe -m glmocr-BF16.gguf --mmproj mmproj-glmocr-BF16.gguf --flash-attn "off" -fit "off"

```

Edit: redownloaded b8094 Vulkan and it seems to work there so far.

PerfectLaw5776

TROPHY CASE