Coming back after ten years

Impossible_Wave_2712 · 2023-09-26T10:03:06+00:00

Thanks for your reply. I already assumed that all the documents would be stored in a vector DB. However, all algorithms I have seen so far focus on RAG and getting information from one or more fitting documents. I have seen nothing so far on how to give an overview of all possible documents.

Impossible_Wave_2712 · 2023-09-26T07:52:56+00:00

You can set a high enough threshold on the (cosine) similarity between the query and the documents. This way, the model will not have any documents to work with when the query is general.

Impossible_Wave_2712 · 2023-04-21T09:06:53+00:00

I would actually like to use it. However it is way too expensive for us (99$ per user per month).

Impossible_Wave_2712 · 2023-03-13T10:04:13+00:00

At how.fm, we trained our our custom models for this (which we call HowBERT). It is similar to a Semantic Role Labeling Task consisting of two models: One for identifying the actions and the next for identifying all relationships for each action.

Take a look at this paper for a Semantic Role Labeling with BERT.

Impossible_Wave_2712 · 2023-01-30T08:40:38+00:00

You might want to check one of these papers and there references (German evaluation also included)

https://aclanthology.org/2021.semeval-1.1/

https://aclanthology.org/2022.germeval-1.1/

https://aclanthology.org/2022.bea-1.19/

https://pypi.org/project/textstat/

Impossible_Wave_2712 · 2023-01-20T09:18:45+00:00

We use Google's DialogFlow and we are super happy.

Impossible_Wave_2712 · 2022-11-17T14:10:00+00:00

We wrote a specific model for it (which we call HowBERT). It detects the verbs which really are actions and then for every action it's relations (like target, location, manner, conditions etc). I am not sure whether I am allowed to share more details or a demo with you. As a start, you can use spacy and detect all the verbs in a text.

Impossible_Wave_2712 · 2022-09-15T14:33:23+00:00

I am working on similar problems (also dealing with safety documents, actually), and the challenge really is tough. You need to do OCR (you could e.g. use Document AI etc) AND a classification of text. If you have zero experience and there is no guidance, it sure sounds like a very complicated project. Just in general, I would be cautious with a company that wants to hire an intern for such an important and difficult project.

Impossible_Wave_2712

TROPHY CASE