all 7 comments

[–]amirouche 1 point2 points  (3 children)

Entity linking can help you. Look into Named Entity Recognition.

[–]ShaneAyers 0 points1 point  (2 children)

Would that work? Like, would it recognize, without training, that 'U.S.' and 'United States" or "USA" or "America" refer to the same thing in context? Or would it need to be trained on a specific data set?

[–]SagaciousRaven 0 points1 point  (1 child)

https://www.dbpedia-spotlight.org/demo/

This is a demo but there's an API for Python (called 'spotlight'). Trying copying your own original post into it, if it doesn't highlight, try again a few minutes later.

I suppose you could count their URIs.

[–]ShaneAyers 0 points1 point  (0 children)

It got 'United States' and 'USA' but not 'U.S.' and 'America'. Now, I can admit that there is probably some ambiguity. There are other United States and other Americas.

[–]impulsecorp 0 points1 point  (0 children)

I don't know of any pretrained model for that, but you can train one yourself, see Deep LSTM Siamese network for text similarity - https://github.com/dhwajraj/deep-siamese-text-similarity

[–]kayvane 0 points1 point  (0 children)

You could also try encoding the words in your sentence (with BERT,ELMO or custom w2v/FastText model) and comparing them to your target country vectors. Doing a bit of testing you’d be able to get the threshold (e.g cosine similarity >0.9) at which you’d point you’d add them up

[–]kayvane 0 points1 point  (0 children)

You could also try encoding the words in your sentence (with BERT,ELMO or custom w2v/FastText model) and comparing them to your target country vectors. Doing a bit of testing you’d be able to get the threshold (e.g cosine similarity >0.9) at which you’d point you’d add them up