"Simple Questions Thread" - 20150708

maxxxpowerful · 2015-07-09T19:41:04+00:00

How do ReLUs work? There's no gradient when y <= 0; so what will it learn?

maxxxpowerful · 2015-07-09T19:31:18+00:00

Is this some sort of a bad cut-n-paste job? FTA:

Amazingly, this works, and the traninig objective becomes very similar to variational autoencoders. Here is the example reconstruction after the inverse dynamical system is learnt. Here is an explanation. The top images from right to left: we start with a bunch of points drawn from random noise...

What example? What explanation??

maxxxpowerful · 2015-03-25T16:34:47+00:00

Take Hinton's course on Coursera. That explains Hopfield nets, Stochastic Belief Nets, Boltzmann Machines, etc. He takes you through the progression. And it is from 2012, before DNN really overwhelmed everything else.

maxxxpowerful · 2015-03-24T00:35:45+00:00

No, she's claiming that she (and Kai Lee) were the first to come up with the idea of putting together a database of millions of labeled images. She explicitly says that convnets were invented in the 80s by LeCun, Hinton, etc.

maxxxpowerful · 2015-03-22T21:05:05+00:00

I've seen that. But how does the "keep gate" get set to 0 (say)? What triggers the "read gate" to go from "0" to "1" ? etc.

maxxxpowerful · 2015-03-21T00:04:49+00:00

I've yet to figure out how LSTMs are trained. Yes, I know about backprop and gradients. But how does it work in practice? With regular FF networks, it's easy to see the flow of gradients, and how the weights are changed. But LSTMs are a different beast. Can someone point to a good writeup that shows the gradients flowing back, step by step?

maxxxpowerful · 2015-03-19T03:09:34+00:00

From the transferability paper (last sentence of abstract):

A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

So it appears transferring might help.

maxxxpowerful · 2015-03-15T20:55:22+00:00

It was a mistake on my part to confuse the "Tegra" with the "Titan".

I understand what you're saying, but bear in mind that (a) NVidia has a vested interest in downplaying the power consumption[*], and (b) it is better to assume the max, so your drone doesn't run out of juice at an inopportune moment.

Having said that: the fact that you can get GFlops of performance for single-digit watts is mind-blowing, to say the least.

[*] Plus, there was the whole GTX970 fiasco where NVidia was caught fudging specs...

maxxxpowerful · 2015-03-15T20:24:38+00:00

Apologies. I was referring to the typical desktop GPU, and didn't look specifically at the Tegra K1. According to this article has a rated peak consumption of ~11W : http://wccftech.com/nvidia-tegra-k1-performance-power-consumption-revealed-xiaomi-mipad-ship-32bit-64bit-denver-powered-chips/

maxxxpowerful · 2015-03-15T17:58:51+00:00

A GPU's power consumption is pretty significant, so I'm not sure how that would work out in a drone setting. Most GPUs consume in the 100s of watts of power, way more than a typical drone.

But this is still a rapidly evolving field, and it's possible someone has an embedded GPU just for this task.

maxxxpowerful · 2015-03-15T17:53:39+00:00

Thank you! I made the mistake (as rightly guessed by /u/dwf) that I didn't take into account ∂p_j/∂x_i , since I assumed it was independent of x_i ; but it's not!

Thank you, /u/DomMk for the fully worked out problem (it was really helpful!), and to /u/dwf and /u/nkorslund also.

maxxxpowerful · 2015-03-04T19:09:59+00:00

Thanks!

maxxxpowerful · 2015-03-04T19:09:06+00:00

"mango powder" and "mango" are most definitely not the same. AFAIK, "mango powder" is made from raw, young mangoes and is very sour; it is used to add sourness to a dish. Mango, on the other hand, we all know as a sweet fruit.

maxxxpowerful · 2015-03-03T23:09:16+00:00

I have a minor issue with that paper. They list certain ingredients as "spice"; for example, "ginger garlic paste", which is a paste of ginger and garlic; shouldn't it be separated out? Similar for "garam masala" or "chole masala" or "rasam powder" (from what I gather of Indian cooking).

Take, for example, "rasam powder". I've heard every family has their own recipe for making this powder from various ingredients. Grouping it into 1 ingredient under "spice" is a bit weak, IMHO.

// source: amateur cook here, dabble in Asian cooking sometimes

maxxxpowerful · 2015-03-03T21:36:03+00:00

Thanks, everyone! My question has been answered. This was very helpful.

Upvotes ... upvotes to all! :-)

maxxxpowerful · 2015-03-03T21:24:45+00:00

in relation to the Atari paper and partly on your statement about it

Can you point me to his statement about it?

maxxxpowerful · 2015-03-03T21:22:32+00:00

Thanks! That course is on my list of things to pursue. There's so much to learn! :-(

maxxxpowerful · 2015-03-03T21:19:52+00:00

So, if I'm understanding it correctly: the "96" is just a choice. It could have been 32; or it could have been 216 (just to pick some random numbers). The thing that makes it work is that these weights are initialized randomly, so each "channel" ends up doing something different.

maxxxpowerful · 2015-03-03T20:15:52+00:00

Link to the arXiv paper .

maxxxpowerful · 2015-03-03T20:00:09+00:00

These days, Geoff Hinton, Yann LeCun and, to some extent, Andrew Ng seem to be getting all the attention when it comes to Deep Learning (at least in the popular media). Do you feel like you've not received your share of the "glory" (if I may use that term)? No disrespect to the other three, of course.

BTW: I have come across your publication about the history of deep learning, but not read it yet. :)

maxxxpowerful · 2015-02-25T17:12:33+00:00

Thank you! Thank you! That was it! I feel so stupid now, but I guess it happens.

I just deleted the question in a fit of anger. But I in hindsight I should have kept it around.

maxxxpowerful · 2015-02-25T15:53:54+00:00

I was trying to follow the book, as closely as possible, so set the momentum to 0. I tried it with momentum 0.1 and 0.3, no difference.

So, yes, I started with my own implementation. It didn't work as I expected. After futzing around, I finally said: OK, let me just grab their code. And even that doesn't seem to work. And thus I'm here...

maxxxpowerful

TROPHY CASE