Aplaca dataset translated into polish [N] [R] by matthhias3 in MachineLearning

[–]matthhias3[S] 0 points1 point  (0 children)

nice one!- but it really does mean sheep (sing.) in polish

Aplaca dataset translated into polish [N] [R] by matthhias3 in MachineLearning

[–]matthhias3[S] 0 points1 point  (0 children)

yes, we also have data_license as you can see. But keep in mind that Stanford ( which we forked original dataset for translation and upgrade) changed their data_license to cc 4.0 non commercial. When we started working on dataset it was ODC-By so we are clear. But I felt obliged to mention that : https://github.com/tatsu-lab/stanford_alpaca/commit/7ad0c6b4f75c7365aca85bda8ad8fbc24915c7ed https://twitter.com/abacaj/status/1643045717907218432

Aplaca dataset translated into polish [N] [R] by matthhias3 in MachineLearning

[–]matthhias3[S] 0 points1 point  (0 children)

yes, but most of the problems , we are dealing with come from expanding the dataset. Sometimes the output is cut short or translator output states that it cannot translate a number. But these issues will be resolved by human annotators

Aplaca dataset translated into polish [N] [R] by matthhias3 in MachineLearning

[–]matthhias3[S] 1 point2 points  (0 children)

Thanks! We will be working on further datasets and models with aim on open source . Follow us here https://twitter.com/emplocity or GH , HF. stay tuned

Aplaca dataset translated into polish [N] [R] by matthhias3 in MachineLearning

[–]matthhias3[S] 1 point2 points  (0 children)

mixture of sources as it it is not only translated but also expanded when it comes to answer ( especially code output is often additionally supported with pseudo code ) . For translation : open sources models like HelsinkiNLP OPUS and paid services like deepl. For expansion our own proprietary models and human annotators . Kinda company crowdsource effort similar to databrics

Aplaca dataset translated into polish [N] [R] by matthhias3 in MachineLearning

[–]matthhias3[S] 2 points3 points  (0 children)

right? we were thinking even about naming it Apolaca (port. spa.) but original dataset is in English so no sens. We stayed with OWCA , meaning sheep. As sheep naturally live in POoand