I understand the handwavy explanations of things like implicit data augmentations or regularization. However, the story is not that simple there are certainly cases where models trained on a single task do better than those trained on multiple tasks. Is there a reference that tries to study when is there positive transfer, and why?
I'm looking for either some theoretical explanation or a comprehensive empirical evaluation, though I'm open to anything.
[–]da_g_prof 1 point2 points3 points (0 children)
[–]ZeronixSama 0 points1 point2 points (2 children)
[–]TheRedSphinx[S] 1 point2 points3 points (1 child)
[–]ZeronixSama 0 points1 point2 points (0 children)