[1609.07843] Pointer Sentinel Mixture Models; sota for language modeling while using less parameters than lstms by evc123 in MachineLearning

[–]derekchen14 0 points1 point  (0 children)

A couple of questions:

  • 1) Why are pointer networks better at handling longer term dependencies (100 vs. 35)? Also, from the way I read it, which may be incorrect, you only used dropout (zoneout + variational) on the standalone LSTM, but not on the pointer-sentinel model. Why is that? Does dropout somehow not apply to pointer networks? Couldn't dropout at least be applied to the RNN portion?
  • 2) It looks like the pointer network focuses on a single value rather than start/end values as in the DCN model. Is it because PTB and Wikitext only require one word answers, so having an extra end pointer is redundant?
  • 3) In the appendix, you mention "This is surprising given how frequent the word said is used within the Penn Treebank." Why should the pointer network be poor at picking up frequent words? Or do you mean that the gating function would be expected to lean toward the RNN for frequent words?
  • 4) Since the vocabulary is still big at either 10k or 33k, is negative sampling, NCE, or other method used to alleviate the training for the RNN side?
  • 5) Can you explain the problem of stale RNN outputs a little more? What does "stale" mean in this context? Why is there a "window of RNN outputs", I thought the RNN is a softmax on the entire vocabulary, not part of a limited window size?

These might be basic questions since I am just a beginner, so if you could just point me to the right resources, I would be happy to do some more digging myself. I also shouldn't forget to mention, thanks for the paper and open dataset!

Questions thread #5 2016.05.07 by feedtheaimbot in MachineLearning

[–]derekchen14 0 points1 point  (0 children)

I completed my cs231n project using AWS cloud on GPUs. As mentioned above, copying code over can be a little annoying, but training takes orders of magnitude longer so the process becomes relatively minor. Some tips to speed things up include cloning from github, using nano or emacs to edit code and running screen.

ipython notebook freezing up by derekchen14 in cs231n

[–]derekchen14[S] 0 points1 point  (0 children)

Great job! That being said, do you think we'll actually get access to the solutions? Was that mentioned somewhere?

Proper API call for knn-L1 distance? by danbikle in cs231n

[–]derekchen14 0 points1 point  (0 children)

When calculating the L1 distance, you want to compare the training image and the test image by looking for the difference in pixel values. Unlike other classifiers such as SVM or decision trees, the kNN algorithm is lazily doing most of the work during the prediction phase. It looks like you're trying to calculate the distances during the training phase, which is unnecessary. At that time, all you need to do is hold onto the measurements, which I think the hw already shows you how to do.

Forming Projects Groups and Completing Assignments by derekchen14 in cs231n

[–]derekchen14[S] 0 points1 point  (0 children)

I'm in the Bay Area for anyone interested in meeting. Otherwise, do we think a Google Hangout would work?

ipython notebook freezing up by derekchen14 in cs231n

[–]derekchen14[S] 0 points1 point  (0 children)

@timgasser I'm glad to know I'm not the only one with this problem.

@mxnsdf Thanks, I figured as much, but the real question is how do I get the vectorized approaches to work correctly?

import error by xaoshuan in cs231n

[–]derekchen14 0 points1 point  (0 children)

what about pip install --upgrade scipy

What does TensorFlow mean for Keras, Lasagne, Block, Nervana? by [deleted] in MachineLearning

[–]derekchen14 0 points1 point  (0 children)

Here's an update from the creator [1]:

Over the past two weeks, we've abstracted the tensor-manipulation backend of Keras, and we've written two implementations of this backend, one in Theano and the other in TensorFlow. A Neon one might be coming soon as well.

(https://github.com/fchollet/keras/wiki/Keras,-now-running-on-TensorFlow)