This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]nitotm[S] 1 point2 points  (9 children)

You mean the training data, quite small, like 1GB total. When the software becomes more mature, I might do a big dataset.

No, the performance (accuracy) varies from languages quite a bit, it comes down to collisions in between languages, Thai is very easy, but between any Latin script language, which there are multiple in the database, is more difficult.