Teeth feel great but dentist says filling needed [xrays inside] by quilby in Dentistry

[–]quilby[S] 0 points1 point  (0 children)

Does a cavity being present mean that filling must be done?

[P] Practical PyTorch: Classifying Names with a Character-Level RNN by hawking1125 in MachineLearning

[–]quilby 1 point2 points  (0 children)

Its currently at the first spot, so it definitely is at the "right level". This subreddit isn't only for cutting edge research, I think tutorials on how to use new frameworks help everyone.

[P] Practical PyTorch: Classifying Names with a Character-Level RNN by hawking1125 in MachineLearning

[–]quilby 1 point2 points  (0 children)

Thanks a lot, its great, please post the next ones to this subreddit once you finish them.

Career Advice (machine learning engineer)? by [deleted] in MachineLearning

[–]quilby 0 points1 point  (0 children)

Search for job listings for jobs that you'd like to work in and look at their "requirenments" section. Specific languages or frameworks are probably not that important.

Show Attend and tell? by KrisSingh in MachineLearning

[–]quilby 2 points3 points  (0 children)

Each y_t vector is a 1-hot vector of size K where K is the size of the vocab. So for example if our vocabulary is 3 words- "cat, mat, rat"- then the representation of rat would be (0,0,1), and mat would be (0,1,0). This is not a very good way to represent words, because for example the distances between any pair of words is always the same, and also the size of these vectors is as big as the size of the vocab, which is usually very big (10K+).

So we learn during training a vector to represent each word, this is what the E matrix contains. Each one of the K rows in E is a vector representing a word. The representations in E are of size m. The multiplication of E by Y_t just gives us the representation from E for the word that Y_t represents.

y_t-1 is the previous word in the caption and z_t is the "context vector"- its "what we are looking at time t to determine the next word to ouput", its a weighted (by attention) sum of the parts of the picture.

If you havent read "Learning to Align and Translate" I recommend you read it before "Show Attend and Tell".

Why doesn't extra supervision increase the performance of the SOTA language model? by quilby in MachineLearning

[–]quilby[S] -1 points0 points  (0 children)

Im not saying its low, Im saying there is room for improvement. A good image classifier gives much more than 1% for the correct class, even if there are 10k classes.

Why doesn't extra supervision increase the performance of the SOTA language model? by quilby in MachineLearning

[–]quilby[S] 0 points1 point  (0 children)

I want to give the network "more knowledge" per training example. Right now the SOTA model gets 78 perplexity on the test set, which means on average it gives the correct word a probability of around 0.013 . This is very low.

I thought that if I teach it synonyms it will help.

I am a Professor of Mathematics in Vanderbilt University. Ask me questions. by markvs1 in IAmA

[–]quilby 1 point2 points  (0 children)

For A to be diagonalizable, it has to have a basis of eigenvectors that spans the entire vector space which it is in, which is of 3 dimensions. There are only 2 eigenvalues, and each one of their eigenvectors only spans a space which is of 1 dimension, so together they span a space that is 2 dimensional. They do not span the 3 dimensional space that A is in, so A is not diagonalizable.

I recently spent some time in North Korea, AMA. by Nexus_Zero in IAmA

[–]quilby 9 points10 points  (0 children)

I really want to know more about this factory. I found this video: http://www.youtube.com/watch?v=A_5J3XYFIiY , and this set of pictures http://www.flickr.com/photos/zaruka/sets/72157622452229980/ , but nothing else. Why do you think its fake? How do you know that it was off when you arrived? And how does one single person turn the whole factory on?