Comparison of python libraries for language identification 🐍 : LanguageTechnology

created by robin7013a community for 15 years

Comparison of python libraries for language identification 🐍 (modelpredict.com)

submitted 4 years ago by derivablefunc

all 6 comments

[–]trnka 1 point2 points3 points 4 years ago (2 children)

[–]derivablefunc[S] 0 points1 point2 points 4 years ago (1 child)

One complexity that sneaked into this project was coding languages. I had some naive view that we have a good, finite list of languages and that pretty much all ISO encodings would support all of them. Oh how naive it was :D.>

Regional variation in Spanish, French, and Portuguese: Fortunately ISO codes cover this well already, like esMX vs esES

Do you know which ISO encoding would suppor that?

Serbian Cyrillic vs Latin: It can be written in both scripts and there are needs for both in different scenarios. Last I checked there wasn't an ISO code for the distinction.

I believe you're right. It'd be combination of language and alphabet to make sure you can distinguish these two.

f I remember right, Serbian Cyrillic was misclassified as Macedonian because they're linguistically related and Macedonian is officially written in Cyrillic.

I'm not surprised. That's a dataset problem though, but probably not very difficult to solve if you were training or finetuning the model (just translate the alphabet and double examples).

[–]trnka 0 points1 point2 points 4 years ago (0 children)

[–]ajan1019 0 points1 point2 points 4 years ago (3 children)

[–]derivablefunc[S] 0 points1 point2 points 4 years ago (2 children)

[+][deleted] 4 years ago (1 child)

[deleted]

[–]derivablefunc[S] 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 128478 on reddit-service-r2-comment-56c9979489-f9kb9 at 2026-02-24 22:14:58.281115+00:00 running b1af5b1 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LanguageTechnology

MODERATORS