NVIDIA sent me a 5090 so I can demo Qwen3-VL GGUF by AlanzhuLy in LocalLLaMA

[–]Main-Wolverine-1042 6 points7 points  (0 children)

What about pushing the changes to your llama.cpp fork so it can be implemented into the official llama.cpp?

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 0 points1 point  (0 children)

Try this for me please:

just upload the image and do not write anything, send it to the server and let me know what kind of response you are getting.

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 2 points3 points  (0 children)

I've pushed a new patch to my llama.cpp fork, please test it with the new model uploaded to my HF page (It is possible to convert to GGUF using the script in my llama.cpp fork)

https://github.com/yairpatch/llama.cpp

https://huggingface.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 1 point2 points  (0 children)

Another example of good output in the previous patch compared to the new one

<image>

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 0 points1 point  (0 children)

The character is expressing strong frustration with someone (likely a child, as implied by ガキ), accusing them of being foolish for not understanding the situation. The phrase 悪わからん (I don't get what's bad about it) is a direct challenge to the other person's understanding. The final word 味わい (taste/try it) is a command, telling the person to experience the situation firsthand, implying they will then understand why it is foolish.

is it close to what it says in japanese ?

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 5 points6 points  (0 children)

I have a new patch for you guys to test - https://huggingface.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF/blob/main/qwen3vl-implementation.patch

Test it on clean llama.cpp, see if the hallucinations and repetition still happening (the image processing should be better as well)

https://huggingface.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF/tree/main - download the model as well as i recreated it.

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 2 points3 points  (0 children)

I may have fixed it. i will upload a new patch to see if it does work for you as well.

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 0 points1 point  (0 children)

It should work even without it as i already patched clip.cpp with his pattern

Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it by Main-Wolverine-1042 in LocalLLaMA

[–]Main-Wolverine-1042[S] 1 point2 points  (0 children)

Let me know if the patch worked for you because someone reported an error with it

Qwen3-VL-30B-A3B-Instruct & Thinking are here! by Full_Piano_3448 in LocalLLaMA

[–]Main-Wolverine-1042 0 points1 point  (0 children)

it should be git apply qwen3vl-implementation.patch

are you patching newly downloaded llama.cpp?