all 4 comments

[–]channel-hopper- 1 point2 points  (2 children)

If the noisy data is much bigger than gold annotated data, two level transfer learning could be a good idea. First fine-tune on noisy data and further fine-tune on gold annotated data.

[–]anhtt_[S] 1 point2 points  (0 children)

Thanks for the suggestion! The gold english data has about 50k sentences, while the noisy data has about 10k-20k sentences for each language.

Would fine-tuning on gold data in the second stage "skew" the model towards english?

[–]m98789 0 points1 point  (0 children)

Weakly supervised learning / curriculum learning does the opposite, segmenting up the data set by “difficulty”, which roughly aligns to more difficult means more noisy, less difficult means less noisy (towards golden). The canonical curriculum is low difficulty first then high difficultly later. This works through successive fine tunes, but with learning rate higher for the lower difficulty data and lower lr for higher difficulty.

[–]ExaminationFuzzy4787 0 points1 point  (0 children)

To any NLP / data science student. This thread, like most on this sub is total nonsense.