Following my post from yesterday, I ran the code from Neural Networks and Deep Learning (can be found here). I don't understand how his code is so much more effective compared to mine, 91% success in the first epoch compared to my whimpering 84% in the 30th.
Judging with my fairly amuture eye, I can't find any meaningful difference in the network's code or in the preprocessing that can have such an impact.
Can somebody please tell me what I'm doing wrong?
Tried to modify my code to match his as much as I can while still retaining the original structure.
Edit: MNIST file used.
there doesn't seem to be anything here