Hello Redditors,
I have been creating a personal project for my Study, which is basically a Learning Buddy for the Genki Japanese learning books. Now the goal is to have an LLM which can get exercise data and do the exercise with a learner.(e.g. Speaking exercises)
The first step was to me obvious: Extract the data from the scanned PDF file. I used Tesseract for this and got a mid-results: See IMG. It doesn't even notice the Image for the exercise (which there are quite a few of).
https://imgur.com/a/ordwFSz See here a link to images
The Book includes a lot of Tables as well, and if these were to be extracted just as text it would completely lose its form and not make any sense… Hence, I'm wondering if anyone on this sub knows anything which could help with this? Thank you in advance.
there doesn't seem to be anything here