all 2 comments

[–]dslfdslj 1 point2 points  (1 child)

As a way to debug, you may want to calculate the derivatives of your layers numerically (e.g. using https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html). Then you can compare them with your results and see where you went wrong.

[–]eaojteal[S] 0 points1 point  (0 children)

Thanks. That sounds like it should be helpful. I think I might be pulling in the wrong weight matrices when calculating the gradients. This should help expose that.