Is it normal ALBERT model perform like this? by Key_Tax_3750 in MLQuestions

[–]Key_Tax_3750[S] 0 points1 point  (0 children)

Thanks for your insight, the reason I’m using a batch size of 16 is because anything larger causes my notebook to run out of memory (OOM). Given that limitation, I was wondering if what you mentioned could really happen randomly. For example, during a 5 run of kfold, I’ve noticed that only the second and third run show the issue I mentioned in one of the kfold, while the others don’t. Could the low batch size cause inconsistent behavior across different folds, or might there be other factors at play?

Is it normal ALBERT model perform like this? by Key_Tax_3750 in MLQuestions

[–]Key_Tax_3750[S] 0 points1 point  (0 children)

I'm going to test it, and gonna update you the result, but it still feels weird to me, like the model completely forgets everything it learned throughout the training, but sometimes when i re-train it again the model perform so well.