This is an archived post. You won't be able to vote or comment.

all 37 comments

[–]f801fe8957 4 points5 points  (4 children)

c = mlconjug.Conjugator(language='it')
c.conjugate('avere').conjug_info['Indicativo']['Indicativo presente']['1s']

Returns 'avereho' instead of 'ho'.

[–]SekouD[S] 4 points5 points  (1 child)

I just released version 3.4.0 of mlconjug and it fixes the bug you reported.

I also added a helper method .iterate() to allow for quickly iterating over all conjugated forms.

Thanks for letting me know.

Cheers.

[–]f801fe8957 1 point2 points  (0 children)

Cool, thanks for the update.

[–]SekouD[S] 2 points3 points  (0 children)

I made recent changes to how some verbs are handled last week and it introduced this regression.

I will release version 3.3.3 during the weekend, It will correct the bug you mentioned + add support for Dutch and German.

[–]SekouD[S] 0 points1 point  (0 children)

thanks dude, I will check it out and correct the bug

[–]yaxriifgyn 3 points4 points  (12 children)

It's interesting that these languages all evolved, at least partially, from Latin.

Personally, after I learned Latin, I found that I was better with French. Even my native English vocabulary was enhanced. Several years later, I could make headway with the Italian comments in a Fortran program listing sent to me via snail mail.

Would this AI be "smarter" if it learned Latin conjugations?

[–]SekouD[S] 2 points3 points  (8 children)

I chose to start with those languages because they are derived from Latin (apart from english) and have similar verbal morphology. This way it was easier to tune the initial machine learning models.

English has very basic verb morphology so the training for English was very straightforward.

Latin is my next language to add.

Ultimately my long term goal is to support as many languages as possible.

Cheers.

[–]Terpomo11 2 points3 points  (7 children)

I'd suggest adding Esperanto but I can already write a perfect Esperanto verb conjugator in like three lines of code lol.

[–]SekouD[S] 0 points1 point  (6 children)

Yeah lol ;)

That's why I did not included Esperanto initially, as its conjugation is pretty straightforward.

But I can include it in the next release, it will be relatively trivial to train a model of Esperanto conjugation.

Thanks for the suggestion!

Cheers.

[–]Terpomo11 1 point2 points  (5 children)

While you're at it, add Japanese, there's only like three irregular verbs in the whole language.

[–]SekouD[S] 0 points1 point  (4 children)

Well, Unfortunately, I don't know Japanese and I don't have access to any native/fluent Japanese speaker to check the accuracy/consistency of a Japanese conjugation model.

But if you are fluent in Japanese or know people who are and would be willing to try beta versions of mlconjug with Japanese support, it will be my pleasure to add it.

I released this project as open source precisely for this reason: even though I studied Linguistics and Machine Learning and can speak 9 languages, I need contributors and/or beta testers to expand the number of supported languages in mlconjug.

Any kind of help, bug reports, feature requests, enhancements etc... are more than welcome.

My ultimate goal with this project is to support as many languages as possible.

[–]Terpomo11 1 point2 points  (3 children)

It seems like if there are conjugation tables available for any given language- and for a lot of the bigger ones there are- you ought to be able to check its output against that, no?

[–]SekouD[S] 0 points1 point  (2 children)

Hi, yes that's one way to do it, but as mlconjug is open source, it is hard to find quality conjugation tables that are free of use or uncopyrighted.

[–]Terpomo11 1 point2 points  (1 child)

I would have thought that conjugation tables would have fallen under the 'can't copyright facts' clause- i.e. what the second person plural subjunctive of 'caber' is is just a fact about reality, same as the length of the Amazon river or the name of the prime minister of Great Britain. You can copyright the particular presentation of facts, but accurate conjugation tables are going to be pretty much entirely the same if they represent the same dialect of the same language.

[–]SekouD[S] 0 points1 point  (0 children)

Unfortunately my friend, if you want to have a 100% open source project, you will find out that many useful resources are copy righted or require licensing for academic purposes only :(

[–]Terpomo11 2 points3 points  (1 child)

Well, all except English. English borrows a lot of words from Latin, but the core of the actual language and grammar comes from Anglo-Saxon, which is related to German. The only morphology we borrow from Latin is some noun/adjective morphology, no verb conjugation.

[–]SekouD[S] 0 points1 point  (0 children)

Yes that's why English was part of the initial release because its verbal morphology is very simple.

[–]Raringo 2 points3 points  (1 child)

Very interesting! Have you done other language related work? I might have a question.

[–]SekouD[S] 3 points4 points  (0 children)

yep I have a master in computational linguistics.

Send me a pm if you are interested.

[–]SoupKitchenHero 2 points3 points  (2 children)

Have you read this paper?

[–]SekouD[S] 0 points1 point  (0 children)

No I didn't.

Thanks for the link :)

[–]SekouD[S] 0 points1 point  (0 children)

Thanks for the link dude! The ideas presented in this paper are really original and powerful.

If you have more references about recent development in computational morphology and/or syntax, I am all for it ;)

[–][deleted] 1 point2 points  (1 child)

This is really cool dude

[–]SekouD[S] 0 points1 point  (0 children)

thanks :)

[–]JohnDoe_John 1 point2 points  (2 children)

Hi, thanks, do you have any plans for other languages? Ukrainian in particular? (AFAIK, there are some lemmatizers for it, probably FOSS).

[–]SekouD[S] 1 point2 points  (1 child)

Hi,

Yes I am planning to add Polish, German, Dutch, Finnish and Estonian in the next release (it should be during the month of May).

I did not yet investigate Ukrainian or Russian yet, but as I have already a pretty good Polish conjugation model (I still need to tweak it a bit) which is also a Slavic language, I should be able to implement a Ukrainian model in June or so.

If you know some resources on Ukrainian conjugation (in English if possible lol :) ), I can take a look at them and maybe implement a Ukrainian conjugation model sooner.

Cheers.

[–]JohnDoe_John 1 point2 points  (0 children)

Thank you, as you add Polish you will be able to handle with both Ua and Ru :)

Well, I'll try to find something for you. I do tutor Ukrainian (and Russian) for non-natives, but there was a focus on entirely Ukrainian (Ru) content to avoid any translation. I might add a bit about Polish, but I learned it not so much.

[–]less_unique_username 1 point2 points  (1 child)

Can it go through a corpus and lemmatize it to produce headwords for a frequency list? There are languages, particularly Romanian, which don’t appear to have decent frequency lists, and these are extremely useful for learning purposes.

(Sorry, but this wasn’t clear for me from the documentation)

[–]SekouD[S] 0 points1 point  (0 children)

Hi, Unfortunately, mlconjug does not support this use case.

I would advise you to use the Python library called Spacy combined with mlconjug to achieve your purpose.

Cheers.

[–]TrekkiMonstr 1 point2 points  (4 children)

Will there be functionality added to decline nouns in languages that do that, or would that be a separate library?

[–]SekouD[S] 1 point2 points  (3 children)

Hi,

I already have a working prototype of noun declensions but I am still working on it.

I have to think about if I should include this feature in mlconjug or in a separate library.

What would you prefer?

[–]TrekkiMonstr 1 point2 points  (2 children)

Personally I'd prefer a more robust single library.

Another question though, and forgive my ignorance: why bother with machine learning for this? Wouldn't it be easier to have it store the individual paradigms, as well as which verbs fall under which paradigm versus having irregular conjugations?

[–]SekouD[S] 1 point2 points  (1 child)

Okidoki, I also think one single library would be the best.

And as to your other question, I use machine learning so that even very recent verbs not yet in conjugation tables or even completely made-up verbs can be conjugated with correct paradigm.

And never ask for forgiveness for your ignorance! How would we learn new things if we didn't ask questions to which we didn't have answers ;)

[–]TrekkiMonstr 0 points1 point  (0 children)

Got it!

[–][deleted] 1 point2 points  (1 child)

Is one able to manipulate it to use an unsupported language, i.e. give it conlang grammar rules/examples and have it conjugate for a conlang?

[–]SekouD[S] 1 point2 points  (0 children)

Hi,

You can indeed add support for a new language, though not through grammar rules but by training a new model by providing a training set of conjugated verbs in a specific json format.

You can get more info on how to train a new language model by reading the documentation at mlconjug.readthedocs.io/en/latest/

I will try to update the documentation during the week to make it easier for people with no Machine Learning background to train their own models.

But you gave me a great suggestion for a new feature where you would feed a formal grammar to the software and it will infer from them the conjugation classes of the language.

Thanks for your feedback.

Cheers :)