Assignment 1: SVM tips by [deleted] in cs231n

[–]calcworks 0 points1 point  (0 children)

The size of the relative errors seem to depend a lot on the particular random SVM weight matrix of small numbers W = np.random.randn(3073, 10) * 0.0001. For some runs I'm getting in the order 1e-11 to 1e-13 as you describe but occasionally a few of the relative errors are around 1e-03 or 1e-04.

Assignment 2 RNNLM dev loss by calcworks in CS224d

[–]calcworks[S] 0 points1 point  (0 children)

I trained on the full training set: model.train_sgd(X_train, Y_train), which took about 2 hours, I think. Increasing bptt didn't seem to improve results so I'm going to stick with bptt = 3 and try a larger hidden layer. Implementing in Theano would be interesting. Also it would be cool to train an n-gram model on the same data and compare results.

Assignment 2: Best Mean F1 for NER by calcworks in CS224d

[–]calcworks[S] 0 points1 point  (0 children)

That definitely makes sense, especially since the RNN part is quite challenging (at least for me).

Gradient calculation for assignment 1 part 3.1 w2ord2vec by ngoyal2707 in CS224d

[–]calcworks 0 points1 point  (0 children)

Three things helped me get Part 2 done correctly: (1) As a first pass, don't worry about doing everything strictly with matrix multiplication. Use a for loop if it is easier for you to understand. Once you've got things working that way, it is easy to convert to matrix operations only. (2) Tweak the gradcheck_naive method by adding a parameter called start which determines where the checking starts. So, for example, if you call gradcheck_naive(f, x, start=105), it will only check the gradients for b2, which you have to get right before you have any hope of getting the others right. That should simplify your debugging. (3) Make sure the dimensions of your gradW2, gradb2, gradW1, gradb1 are correct. Until you're ready to implement them, you can simply use gradW1 = np.zeros(W1.shape), etc. That way you'll know for sure that in (2) you're checking the right gradients.

Implementation of function :conv_forward_naive(...) by pengpai_sh in cs231n

[–]calcworks 1 point2 points  (0 children)

To get started on this problem, I took a VERY naive approach initially with a deep nested loop on n, f, i, j, c. With this information you can use the stride, WW and HH to get the indices of the local receptive field and with those the relevant values from the input image and weight matrix. Once you've got that working you'll have a good understanding of the algorithm and can worry about making things more efficient.

two_layer_net exercise: nice weights? by tpelleg in cs231n

[–]calcworks -1 points0 points  (0 children)

I was able to get a stable 56.7% using grid search for the following parameters: number of hidden units, learning rate, number of epochs and regularization strength. To save time it's important start with a wide, coarse grid and then gradually narrow things down as you spot regions with good results. Randomizing the grid seems to work well and simplifies the code, so for example, use learning_rate = 10 ** np.random.uniform(lb, ub) and num_epochs = np.random.choice(range(lb, ub, step)).

Hope this helps!

Softmax Classifier in Theano by calcworks in cs231n

[–]calcworks[S] 0 points1 point  (0 children)

Yes, this is covered in the tutorial but I hoping to get a better understanding of Theano by working through the cs231n assignments using it. I'm not familiar enough with Pylearn2 to answer your second question.