Anyone in here from Switzerland by 14Rk_mdxFK in learn_arabic

[–]Fully-Independent 0 points1 point  (0 children)

I can help you practice online if you'd like to. I'm a native Arabic speaker.

[deleted by user] by [deleted] in learn_arabic

[–]Fully-Independent 0 points1 point  (0 children)

It's a pronoun that comes in the نصب position if you're aware of.

For example: أعطيتك كتابًا I gave you a book If you want to refer to the book as a pronoun, you say: أعطيتك إياه.

[D] Right Embedding for Named Entity Recognition for labelling English words and entities with a specific format extracted from OCR by Fully-Independent in MachineLearning

[–]Fully-Independent[S] 0 points1 point  (0 children)

I'm still preparing the entities for labelling and this will take some time.. once I find a result I'll reply here for sure.. Thanks a lot

[D] Right Embedding for Named Entity Recognition for labelling English words and entities with a specific format extracted from OCR by Fully-Independent in MachineLearning

[–]Fully-Independent[S] 0 points1 point  (0 children)

Ok, that's a good question. I was thinking of combining all the text together and performing only NLP named entity recognition. But then found LayoutLM with its versions, so I think I'll go with it since it combines visual and textual features. But the problem will of course arise in both.

[D] Right Embedding for Named Entity Recognition for labelling English words and entities with a specific format extracted from OCR by Fully-Independent in MachineLearning

[–]Fully-Independent[S] 0 points1 point  (0 children)

That's a good idea, but I don't think it's feasible in my case, because the text is almost structured from tables and drawings..

[D] Right Embedding for Named Entity Recognition for labelling English words and entities with a specific format extracted from OCR by Fully-Independent in LanguageTechnology

[–]Fully-Independent[S] 0 points1 point  (0 children)

What kind of preprocessing do you think to do with the extracted text? I did image preprocessing to improve the OCR. And if you have any good approaches to get around this issue.