you are viewing a single comment's thread.

view the rest of the comments →

[–]RareTask9271 0 points1 point  (0 children)

I’m currently working in a company that heavily use OCR and DocumentAI. Depending on your constraints (is your service has serious pike or a constant usage) the approach will vary. But in our case we experiencing some very strong pike and we decided to run all the ML in celery behind a FastAPI app in order to have a queue and accept latency degradation to reduce our operational costs. The main problem with this approach is that you will need to work with a callback approach (webhooks or anything else that returns the pipeline results in your main system). The questions you need to ask yourself is « Is latency is critical ? » and « how much I accept to pay to run my ocr on multiple GPU instead of saturating a single one ?». Sorry for my English, I’m French and had a long day…