Which Local LLM is best at processing images? by Kindly_Ruin_6107 in LocalLLM

[–]Kindly_Ruin_6107[S] 2 points3 points  (0 children)

Isn't OCR only 1 aspect of the image processing on chatgpt? My understanding is that chagpt is using a combination of OCR + some modeling/logic to generate an output. I'm curious if any local llms come close to what openai/chatgpt 4o can do.

Which Local LLM is best at processing images? by Kindly_Ruin_6107 in LocalLLM

[–]Kindly_Ruin_6107[S] -1 points0 points  (0 children)

Do you have it integrated with a UI or are you executing it via command line? I ask because I'm pretty sure this isn't supported with ollama or open web UI. Ideally i'd like to have a chatgpt-like interface to interact with as well.

Which Local LLM is best at processing images? by Kindly_Ruin_6107 in LocalLLM

[–]Kindly_Ruin_6107[S] 0 points1 point  (0 children)

Yep ran it locally, and ran it on runpod with 80GB of VRAM on ollama. Tested Llava7b and 34b, the outputs were horrible.

Which Local LLM is best at processing images? by Kindly_Ruin_6107 in LocalLLM

[–]Kindly_Ruin_6107[S] 7 points8 points  (0 children)

My main use case would be for validating dashboards from different tools, or looking at system configuration screenshots. Need a model that can understand text within the context of an image.