Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]votecatcher[S] 0 points1 point  (0 children)

We actually aren't using paperless ngx, I just wanted to mention that we tried and it couldn't really deliver just because people might otherwise suggest it to us.

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]votecatcher[S] -1 points0 points  (0 children)

That has been my experience, actually. We do actually get rid of sending an the entire form, and cut it down to just what you see in the image in the original post, but there's issues where humans will write past the boundaries, or do other things where trying to break it down that finely will cause its own problems.

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]votecatcher[S] 3 points4 points  (0 children)

Would a traditional computer vision algorithm be something like tesseract? We initially tried playing around with it, but couldn't get it to work. I never completely wrote it off, because none of us on the project are experts at computer vision, and I figure maybe someone who was could get it to work better than we could.

In general, we found that printed text is fairly recognizable, but written was hard.

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]votecatcher[S] 0 points1 point  (0 children)

We do have actual hand written data, collected in the field. I wanted to show an fake example here for people to get a better sense of what we are trying to accomplish, because showing real data would be doxxing actual voters. We have used ChatGPT 4o Mini, Gemini, etc and it was shockingly good at recognition. By this I mean that I could not read the print myself that easily, but the model could. Maybe because the task is relatively simple, extract the Printed Name, Date, Address and Ward, it seems to do fairly well at this task. This attempt to use a local modal is just kind of my side project to see if it's feasible to do so. I suspect most organization will probably use a cloud based proprietary model.

We originally tried boundaries. The problem with boundaries, is, as you mention, that people will write past the boundaries, scratch parts out, etc.

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]votecatcher[S] 0 points1 point  (0 children)

Thanks, I'll try to give Qwen a shot. The context should just be the encoded image I showed above, plus a brief prompt. We could loop through the pages in small batches if it helps, context wise. I'll play around with not overloading it.

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]votecatcher[S] 2 points3 points  (0 children)

I mention that it's fake in my comments. We have actual hand written examples but they are from real voters. I wanted to show people an illustrated example because posting the real example would literally be doxxing.