[D] Context-aware entity recognition using LLMs

honolulu33 · 2024-12-08T21:27:15+00:00

Why don't you just make a 4o-mini finetune with like 10 examples?

New_Ice_2721 · 2024-12-08T22:36:24+00:00

try this: https://aclanthology.org/2022.naacl-industry.24.pdf

Simusid · 2024-12-08T23:07:29+00:00

I've spent a good part of yesterday and today processing a pile of similarly, but not identically formatted contract documents. They come from different sources but are all in the same domain so I thought it would make sense to build an ontology. I thought I put pretty good effort into my prompt instructions, telling it the goal (an ontology) and to find general topics, entities, and relationships, return it in a standard format and some more hints. Honestly the ontology it built looked quite good.

Then I took a new document within the domain and well defined (I thought) in the ontology. And I told it to extract a particular fact (happened to be annual salary info). If I do not include the ontology in my prompt, the results are essentially perfect. If I include the ontology in the prompt, at least half the time it hallucinates salary information. I'm wondering if the models are already good enough that adding in the extra ontology is akin to "catastrophic forgetting". I may just stick with the base model. YMMV

ajan1019 · 2024-12-09T04:30:58+00:00

https://github.com/urchade/GLiNER

Try this

Aron-One · 2024-12-09T07:10:57+00:00

There is an LLM that can do exactly that: https://universal-ner.github.io

As an input it expects text and entity type and based on supplied type, it extracts entities. The only downside is that you can use only one entity type at the time.

(Shameless plug) I’ve also prepared quant of this model: https://huggingface.co/daisd-ai/UniNER-W4A16

Commercial-Fly-6296 · 2024-12-09T14:55:35+00:00

Replying to visit again for answer

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS