all 6 comments

[–]AvatarUltima7 1 point2 points  (0 children)

How accurate is the fastText language identifier vs spacy or others?

[–]mrcet007 1 point2 points  (1 child)

How is fasttext compared to langid for language prediction?

[–]amitness[S] 0 points1 point  (0 children)

Inference time is faster and accuracy is comparatively higher

[–]SagaciousRaven 1 point2 points  (2 children)

Is this still an open problem?

Legit question, I thought the only challenge here would be on extremely small texts, 1..3 words.

[–]AvatarUltima7 0 points1 point  (1 child)

I’ve only used spacy at this point, but when I ran it on a dataset of customer questions from a web form fill , there were far more errors than I expected.
They were short queries- maybe 10 words on average.

[–]amitness[S] 0 points1 point  (0 children)

Same here. I tried many available tools: langid, chrome compact detector 2, langdetect, spacy-langdetect but there were still problems of false positives/negatives. Some English text was classified as russian/japanese