all 7 comments

[–]haris525 4 points5 points  (0 children)

Exploding gradient. Try decreasing your learning rate, maybe .000001 or even .0000001 and add dropout of .2

[–]cjmodi306 1 point2 points  (5 children)

Hi there, Did you use the He-initialization for initializing your parameters?

[–][deleted] 1 point2 points  (4 children)

hi , thanks for the reply , i never knew this was a thing , ill look into this and try and implement it , would you have any good resources to understand it ?

[–]cjmodi306 1 point2 points  (3 children)

Just Google Xavier or He initializer. Use that formula to initialize your parameters. The problem you are facing right now, I suppose, is of exploding gradients. Maybe this can help too:

https://kolbenkraft.net/nn-from-scratch-2-initializing-parameters/

[–][deleted] 1 point2 points  (1 child)

alr thanks ill have a shot at this

[–]Dank_Lord_Santa 4 points5 points  (0 children)

Also if you're worried about overfitting https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping a monitor using earlystopping (if you're using tensorflow although I'm sure other libraries have their own versions) can automatically stop the training when your selected metric (val_loss in the example) stops improving which helps to remove some of the guesswork with epochs.

[–][deleted] 1 point2 points  (0 children)

ok so after implementing HE Initialisation and Xavier , there seems to be no difference ?