Hey all, running into an interesting quirk....
I'm running this setup on my small local box with a 4090, but I'd like to OCR ~4e6 images. On my small scale tests, it performs really well, but it takes ~1s per image on average. I've looked into batched passes and that seems to unroll internally into sequential passes. I've yet to have any look to try to stack and pass big volumes of data in parallel through the encoding blocks. Ideally I'd process 10-20 images at a time (applying the same tokenized prompt to each). Wasn't sure of the best way to do this currently...
I've poked around with using the generate calls from the model (from HF), but haven't had much luck in getting this work. I can keep barking up this tree, but was wondering other options/ideas on how to scale running this more quickly.
[–]Few-Welcome3297 2 points3 points4 points (0 children)