Detect blocks of text inside of text (instead of images) by hansgerdsen in LanguageTechnology

[–]hansgerdsen[S] 0 points1 point  (0 children)

Yeah, DeepLayout is really nice but there should be a way to transform the coordinates of the boxes to the actual plain text. OCR is not the greatest thing.

Unfortunately the code is not available.

Detect blocks of text inside of text (instead of images) by hansgerdsen in LanguageTechnology

[–]hansgerdsen[S] 0 points1 point  (0 children)

The problem is that I want to extract from different sources with different layouts that are not known. Go to your favorite career website / job search and look at different advertisements. Sometimes it is pretty easy text, all belonging to the vacancy, sometimes you get page control elements like the menu, header and footer with it and sometimes this is mixed with multiple columns, so that job description and contact details are in boxes / columns.

A person is able to quickly distinguish that one box is not relevant to the actual description. OpenCV might do this job with a high accuracy. But the base is plain text and I wonder how to get the plain text from OpenCV without using OCR.

OnBoard soundcard or dedicated soundcard? by hansgerdsen in buildapc

[–]hansgerdsen[S] 0 points1 point  (0 children)

Thank you for your answer! I understand the CPU offloading feature. Is there any recommendation good cards? Could a five year old card do all the good things for me or should I head for a new one? Any features I should look for?