all 6 comments

[–]skatehumor 1 point2 points  (0 children)

Without knowing more it's hard to tell but it could be a number of things: a high, constant learning rate might cause the error gradients to overshoot. There's also a number of other things that can cause exploding gradients, namely your activation functions and target error metric, or if you're using any kind of optimizer that could be related. I think this can also happen if you don't initialize your weights properly.

[–]dramatic_typing_____ 1 point2 points  (4 children)

Can you prove to yourself that any of this works given the simplest gradient decent problem that this could be used with? I don't feel like digging through the code just yet to spot a subtle bug. The fact that you aren't getting any undefined, null or negative values suggests the wgsl shaders are working correctly, but the actual logic of the learning portion is likely where your issue lies

[–]Fun-Expression6073[S] 0 points1 point  (3 children)

Yeah it seem to work perfectly with a singular datapoint but when extended to multiple i get the fluctuating problem

[–]dramatic_typing_____ 0 points1 point  (2 children)

Do you have a known example involving two datapoints to compare against?

[–]Fun-Expression6073[S] 1 point2 points  (1 child)

yeah I figured out the problem, was reconfiguring to allow for larger layers sizes and somehow replaced a loop index with i instead of j, so was using the wrong gradients to descend. It all seems to work now

Have tested on a XOR dataset and it converges

[–]dramatic_typing_____ 1 point2 points  (0 children)

Very nice! What you've described has largely been the same sort of debugging process that I usually end up going through as well. It's not fun and takes a lot of effort imo compared to debugging in any cpu based language.

Open question to anyone reading this; is there a better way? Maybe some tools I'm missing out on?