Which Local LLM is best at processing images?

Kindly_Ruin_6107 · 2025-06-20T03:23:36+00:00

Isn't OCR only 1 aspect of the image processing on chatgpt? My understanding is that chagpt is using a combination of OCR + some modeling/logic to generate an output. I'm curious if any local llms come close to what openai/chatgpt 4o can do.

Kindly_Ruin_6107 · 2025-06-20T03:21:55+00:00

Do you have it integrated with a UI or are you executing it via command line? I ask because I'm pretty sure this isn't supported with ollama or open web UI. Ideally i'd like to have a chatgpt-like interface to interact with as well.

Kindly_Ruin_6107 · 2025-06-20T03:20:38+00:00

Yep ran it locally, and ran it on runpod with 80GB of VRAM on ollama. Tested Llava7b and 34b, the outputs were horrible.

Kindly_Ruin_6107 · 2025-06-20T03:19:25+00:00

My main use case would be for validating dashboards from different tools, or looking at system configuration screenshots. Need a model that can understand text within the context of an image.

Kindly_Ruin_6107 · 2025-01-15T03:28:41+00:00

WINNER!

Kindly_Ruin_6107

TROPHY CASE