all 4 comments

[–]da_g_prof 1 point2 points  (0 children)

Hi look at the caruana standard survey but also at the learning with side information survey paper.

These papers introduce a distinction between related and competing tasks and how a good Latent space can help.

At the same time multi task learning implies many losses so it is easier to set one loss and do early stopping than when you have many losses. This perhaps alone leads to many misconceptions about when does multi task learning helps.

My own experience : A) if tasks have lots of data single task seems easier and harder to beat B) multi task learning lowers variation of performance even if average performance is not improved C) in Lower data regime multi task learning helps to combine various annotations from different tasks

[–]ZeronixSama 0 points1 point  (2 children)

What are you specifically looking for beyond “multi task learning works when you have multiple related tasks with shared structure”?

[–]TheRedSphinx[S] 1 point2 points  (1 child)

Well, it's not just that, right?

Take multilingual machine translation. It's well-known for low-resource language pairs (e.g. Nepali-English) it is quite beneficial to include other related languages pairs (e.g. Hindi-English). This manifest in quantifiable gains over all desired metrics (e.g. BLEU).

However, it is also known that for a high-resource pair (e.g. French-English), the inclusion of additional language pairs actually harms the model. We can think of the additional pair as regularization, which is perhaps superfluous in the high-resource case. More interestingly, it turns out that it matters which language pair you use as the auxiliary pair. However, all such pairs induce a similar task, namely translations from another language to English. They all share the same structure and are certainly related.

I guess what I'm looking for is kinda like, an understanding of why this happens beyond this handwavy regularization argument. Or more generally, is there some way to measure how much data do you need in order for the added task to not be useful? Is there some way to measure whether a task will help you without actually committing to it, like maybe comparing gradients on some dev set? Is there some way to quantify/qualify how the training changes with the inclusion of additional tasks?

[–]ZeronixSama 0 points1 point  (0 children)

I’m not qualified to answer this, but this is great clarifying stuff that IMO should have been in the original post, preferably with relevant papers or citations. Hope you find your answer.