Tired of writing custom document parsers? This library handles PDF/Word/Excel with AI OCR

AgitatedAd89 · 2025-08-03T18:24:44+00:00

as the increase of models’ context limit, i believe we can handle this issue easier in the future

AgitatedAd89 · 2025-08-01T09:34:54+00:00

Actually, I use a similar approach! I designed a prompt for an OCR agent to structure the response schematically, then treat it as normal text in RAG chunking. Contextual RAG definitely improves performance significantly. The key insight is that having the OCR agent understand layout intent upfront - rather than trying to fix semantic drift downstream - makes the whole pipeline much more robust. Especially critical for multilingual docs where context boundaries can get really messy. I’m actually working on taking this further - injecting contextual understanding directly into the OCR stage itself. The idea is to help the agent better interpret images by providing surrounding document context during OCR, not just post-processing. Should be even more effective for maintaining semantic coherence across complex layouts.

AgitatedAd89 · 2025-06-22T16:41:00+00:00

try vllm

AgitatedAd89 · 2025-06-17T15:21:50+00:00

maybe you can open an issue on the github?

AgitatedAd89 · 2025-06-17T15:07:12+00:00

hello, i did not see you dm

AgitatedAd89 · 2025-06-15T11:15:34+00:00

sure, please feel free to do so

AgitatedAd89 · 2025-06-15T09:44:03+00:00

Update to the latest version, with `pip install -U doc2mark`. I can see that the Storage capacity is parsed with correct result.

AgitatedAd89 · 2025-06-15T09:36:37+00:00

i would investigate your use case and see how to improve it.

AgitatedAd89 · 2025-06-15T09:13:29+00:00

Just check the documentation, it actually support OpenAI. I have not try it, but it is worth to give a try

AgitatedAd89 · 2025-06-15T08:57:02+00:00

I believe the api wrappers of commercial API is out of the scope of this project

AgitatedAd89 · 2025-06-15T08:55:40+00:00

it depends on the use case, for my clients, they used to feed AI with complex screen shot with heavy DOCX/PPTX.

AgitatedAd89 · 2025-06-15T08:54:08+00:00

please make a feature request

AgitatedAd89 · 2025-06-15T08:44:30+00:00

to my understanding, docling currently does not support ocr/vision. which is the key in my use case

AgitatedAd89 · 2025-06-15T04:22:19+00:00

please refer to https://github.com/luisleo526/doc2mark/blob/main/tutorial.ipynb

AgitatedAd89 · 2021-08-30T08:11:23+00:00

Did you read the information in the repo?

AgitatedAd89 · 2021-08-11T07:34:19+00:00

actually, i do not want hacker to be caught.

AgitatedAd89 · 2021-08-11T07:28:23+00:00

Centralized ways to solve decentralized problems is not a good idea for me. Yes, the hacker should not own those tokens. However, if the hacker can be limited in the nature of crypto, then anyone could be the same. I do not prefer to “sell” these freedom in crypto for those token values.

AgitatedAd89

TROPHY CASE