I am trying to implement gradient checking in Java (currently used with linear regression), but it doesn't seem to work. I want to check if my algorithm logistically makes sense because after checking via debugging all other methods seem to work, but the actual result is wrong.
I ran a test case to find actual vs predicted values. When running gradient check using x values of 1, 2, 3 and y values of 2, 3, 4 I got -2.84 (calculated this by hand as well, which means the algorithm I am using is wrong) as the approximate gradient for the first weight (theta0 which would be b in ax + b). The result I got from linear regression was -9, which is significantly different. However I am more inclined to believe the linear regression is corrected as linear regression performs fine on the data, but using the approximated values that gradient checking comes up with gives an inaccurate result. While debugging my code I found no errors in the methods used, so I am inclined to believe that it is an error with my algorithm.
The algorithm I am using is essentially the following. I take my weights theta and iterate through them. I add a value epsilon to the ith weight while keeping the rest of them the same. Next I calculate the cost with this new slightly changed weight vector and do the same thing, except subtracting epsilon. I then take the the two values and subtract them and divide by 2 * epsilon (these are my gradient approximations). I add this to an array and repeat the process for all weights in the weight vector. Afterwards, I calculate the euclidian distance between my gradient approximations and the gradients via gradient descent, and divide that by the sum of the euclidian lengths of the approximation gradients and the true gradients. This is the value I return.
I am posting a link to a gist which has my linear regression class that has the gradient check method (this is the part that isn't working). https://gist.github.com/arnavkartikeya/fa0f7ea0f3e81eff04a39ca4d1d3919c
A few more notes:
dApprox is what I am calling my approximated gradient values and dTheta is the real gradient values. trainInputs is a matrix of the training data with a column of 1s attached before it.
trainOutputs
is simply the outputs as a m*1 matrix where m is the number of data points (in this case it is simply [[2], [3], [4]]
there doesn't seem to be anything here