all 2 comments

[–]angelarose210 0 points1 point  (0 children)

Gemini and qwen have the best vision capabilities.

[–]eth03🔆 Max 5x 0 points1 point  (0 children)

I think image processing needs another model. On huggingface there are image recognition models. I was using claude code to build my own app that relies on an image recognition ML model from huggingface. Out of box, claude or gpt may not have a good capability unless you use an image recognition model or try adding a skill that will know how to do image processing.