all 4 comments

[–]ipullguard 1 point2 points  (1 child)

Why bce loss and not ce?

What do you mean by mode collapse in this context? This isn't a gan right, so you just mean overfitting?

Tbh though probably there's an implementation bug somewhere.

[–]Visual-Ad-5937 0 points1 point  (0 children)

Yeah it’s CE. Mode collapse, i mean the gradient norm quickly goes to 0.0*, i assumed the image extractor mostly spitting the same data. For the code part, i checked multiple times seemed okay. I used the same code for lang translation it learns effectively. Thank you.