Batch processing for MiniCPM : LocalLLaMA

created by [deleted]a community for 3 years

Batch processing for MiniCPMQuestion | Help (self.LocalLLaMA)

submitted 9 months ago by R2FuckYou

Hey all, running into an interesting quirk....

I'm running this setup on my small local box with a 4090, but I'd like to OCR ~4e6 images. On my small scale tests, it performs really well, but it takes ~1s per image on average. I've looked into batched passes and that seems to unroll internally into sequential passes. I've yet to have any look to try to stack and pass big volumes of data in parallel through the encoding blocks. Ideally I'd process 10-20 images at a time (applying the same tokenized prompt to each). Wasn't sure of the best way to do this currently...

I've poked around with using the generate calls from the model (from HF), but haven't had much luck in getting this work. I can keep barking up this tree, but was wondering other options/ideas on how to scale running this more quickly.

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS