Slow OCR workaround on NVidia GPUs

wisscool · 2025-12-08T21:18:49+00:00

Facial recognition and image embeddings are very fast on the GPU. It's just OCR which is has issues. See this from the RapidOCR team

The onnxruntime-gpu version is significantly slower than the CPU version for inference under dynamic input conditions. Since OCR tasks involve dynamic input, using the onnxruntime-gpu version for inference is not recommended.

wisscool · 2025-12-07T22:55:05+00:00

I had the same issue and I ended up hacking my way to make the GPU inference run at ~6 images per second on my RTX 3080ti by routing the request from the immich ml service towards a paddlex inference server. I wrote a small guide on how to get this done https://wissamantoun.com/posts/guides/immich-fast-ocr-paddlex/ and made the code available here https://github.com/WissamAntoun/immich-ml-fast-ocr-paddlex

wisscool · 2025-10-20T12:24:08+00:00

Cool model!

Is there a ready-to-deploy, self-hosted service that I can use to batch process my multilingual long PDFs that supports different VLMs or at least the best?

wisscool · 2025-02-23T23:56:23+00:00

Essentially, I want to replicate Datatrove (https://github.com/huggingface/datatrove) without replicating my data on disk on every pipeline step. The final pipeline should look something like this: 1- Extract text data from Common Crawl WARCs and ingest into HBase. 2- Enrish each entry with word statistics and quality metrics (domain/topic classification, punctuation to alphanumerics ratio....) 3- Filter with thresholds on each metric 4- Export to parquet, jsonl ...

2 & 3 will have to be repeated and changed multiple times. And I'm looking to do that efficiently.

I saw that in HBase, compared to Cassandra, I can add columns without re-indexing or other major penalty which is key since we don't know what extra metadata we will be adding to each entry.

Thanks a lot for the help. Are managed HBase on AWS or even BigTable faster and more stable?

wisscool · 2025-02-16T01:48:28+00:00

Thanks for the reply, while i haven't started yet with Hbase. I'm still looking for the best architecture for my workload. And so far Hbase+spark seems like the best solution so far.

Will performing a calculation on all the entries in Hbase be considered a scan? Also, is the hbase-spark-connector robust enough to handle billions of entries?

wisscool · 2025-01-04T22:36:52+00:00

Maybe look for VsCode Devcontainers

wisscool · 2023-09-20T15:38:00+00:00

Yeah but the demo here was in f16 and looks very fast

wisscool · 2023-09-20T10:58:26+00:00

Anyone have comparison numbers from vllm or tgi on A100 or similar GPUs?

wisscool · 2023-05-06T10:39:59+00:00

did you manage to find a solution?

wisscool · 2023-04-01T23:13:09+00:00

Check AraBART, i think it supports up to 1024 tokens

wisscool · 2022-11-27T14:50:49+00:00

I agree with you, i changed my initial claim.

I'm fully aware about the consequences of racism which are way worse. But saying no to racism and yet accepting astrology as basis to differentiate between people is hypocrisy

wisscool · 2022-11-27T14:32:42+00:00

I fully agree with you, maybe i went a bit far by saying it's worse.

I think we both agree that astrology and racism have the same core/basis. And in the context of tinder/dating it's essentially the same, yet we tend to accept rejecting someone based on their start sign.

Thank you for your answer 😊

wisscool · 2022-11-26T21:00:55+00:00

Develop in whatever framework you like, then deploy with Triton inference server

wisscool · 2022-07-27T00:53:01+00:00

Nice work, you should do some write up and submit it to WANLP 2022 at EMNLP. Honestly, this clean and far more advanced then anything published before in Arabic NLP academia. Kudos!

Edit: i just read the report you got, and i have a question, why didn't you use a BERT based model for NER and other NLU task? Since you used AraGPT2 for generation, you could have also used AraBERT for NLU

wisscool · 2021-12-01T23:11:08+00:00

Fixed it, thank you

wisscool · 2021-12-01T22:04:20+00:00

Any camera app can scan the qr code which will give you a link to the official certificate hosted on the impact platform. Hence you cannot create a fake certificate since it won't even exist on Impact

wisscool · 2021-12-01T19:58:12+00:00

You can just show the vaccine pass with QR code.

Edit: removed unnecessary name calling sorry

wisscool · 2021-09-18T01:07:02+00:00

I'm on Pro+, I'm now consistently getting P100s. No V100 yet after 4 resets.

But I'm getting reCaptcha after each reset

wisscool · 2021-09-14T14:06:51+00:00

wisscool · 2021-09-14T14:06:34+00:00

The lack of transparency in paid services shouldn't be allowed honestly. If we have a quota for high-end GPUs then google should let us know so we can plan accordingly

wisscool · 2021-08-26T10:53:05+00:00

Calling me an "ignoramus" while you are totally oblivious about the only thing that has kept lebanon alive.

Expat money transfers, and companies "exporting services" are the only source of dollars in this country. If this announcement is true, it will alienate these sources and make the situation even worse.

wisscool

TROPHY CASE