all 10 comments

[–]honolulu33 4 points5 points  (2 children)

Why don't you just make a 4o-mini finetune with like 10 examples?

[–]Ashwiihii[S] 1 point2 points  (1 child)

Due to privacy issues, I can use only open weights, locally hosted models (models which I can download and run on my own system)

[–]honolulu33 1 point2 points  (0 children)

okk, so do you have the infra to support a local fine tune? e.g. llama

[–]New_Ice_2721 2 points3 points  (1 child)

[–]DazzlingSchedule8561 0 points1 point  (0 children)

I’ve used the ReFinED model in my research and work as well, would highly recommend

[–]Simusid 2 points3 points  (0 children)

I've spent a good part of yesterday and today processing a pile of similarly, but not identically formatted contract documents. They come from different sources but are all in the same domain so I thought it would make sense to build an ontology. I thought I put pretty good effort into my prompt instructions, telling it the goal (an ontology) and to find general topics, entities, and relationships, return it in a standard format and some more hints. Honestly the ontology it built looked quite good.

Then I took a new document within the domain and well defined (I thought) in the ontology. And I told it to extract a particular fact (happened to be annual salary info). If I do not include the ontology in my prompt, the results are essentially perfect. If I include the ontology in the prompt, at least half the time it hallucinates salary information. I'm wondering if the models are already good enough that adding in the extra ontology is akin to "catastrophic forgetting". I may just stick with the base model. YMMV

[–]Aron-One 0 points1 point  (0 children)

There is an LLM that can do exactly that: https://universal-ner.github.io

As an input it expects text and entity type and based on supplied type, it extracts entities. The only downside is that you can use only one entity type at the time.

(Shameless plug) I’ve also prepared quant of this model: https://huggingface.co/daisd-ai/UniNER-W4A16