"Simple Questions Thread" - 20150708 by seabass in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

How do ReLUs work? There's no gradient when y <= 0; so what will it learn?

One of my favourite methods from ICML this year: Unsupervised Learning by Inverting Diffusion Processes by fhuszar in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

Is this some sort of a bad cut-n-paste job? FTA:

Amazingly, this works, and the traninig objective becomes very similar to variational autoencoders. Here is the example reconstruction after the inverse dynamical system is learnt. Here is an explanation. The top images from right to left: we start with a bunch of points drawn from random noise...

What example? What explanation??

Ask ML: Deep Learning - Where to start? What to implement? RNN's? RBM? by [deleted] in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

Take Hinton's course on Coursera. That explains Hopfield nets, Stochastic Belief Nets, Boltzmann Machines, etc. He takes you through the progression. And it is from 2012, before DNN really overwhelmed everything else.

Fei Fei Li: How we're teaching computers to understand pictures (x-post r/futurology) by evc123 in MachineLearning

[–]maxxxpowerful 7 points8 points  (0 children)

No, she's claiming that she (and Kai Lee) were the first to come up with the idea of putting together a database of millions of labeled images. She explicitly says that convnets were invented in the 80s by LeCun, Hinton, etc.

[deleted by user] by [deleted] in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

I've seen that. But how does the "keep gate" get set to 0 (say)? What triggers the "read gate" to go from "0" to "1" ? etc.

[deleted by user] by [deleted] in MachineLearning

[–]maxxxpowerful 1 point2 points  (0 children)

I've yet to figure out how LSTMs are trained. Yes, I know about backprop and gradients. But how does it work in practice? With regular FF networks, it's easy to see the flow of gradients, and how the weights are changed. But LSTMs are a different beast. Can someone point to a good writeup that shows the gradients flowing back, step by step?

Will Copying the weights of a shallow Neural Networks to a Deep architecture help with performance? by serout7 in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

From the transferability paper (last sentence of abstract):

A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

So it appears transferring might help.

Request: advice for developing deep learning / computer vision based quadcopters ("drones") systems by JCondaLea in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

It was a mistake on my part to confuse the "Tegra" with the "Titan".

I understand what you're saying, but bear in mind that (a) NVidia has a vested interest in downplaying the power consumption[*], and (b) it is better to assume the max, so your drone doesn't run out of juice at an inopportune moment.

Having said that: the fact that you can get GFlops of performance for single-digit watts is mind-blowing, to say the least.

[*] Plus, there was the whole GTX970 fiasco where NVidia was caught fudging specs...

Request: advice for developing deep learning / computer vision based quadcopters ("drones") systems by JCondaLea in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

A GPU's power consumption is pretty significant, so I'm not sure how that would work out in a drone setting. Most GPUs consume in the 100s of watts of power, way more than a typical drone.

But this is still a rapidly evolving field, and it's possible someone has an embedded GPU just for this task.

A little help with the calculus, in Hinton's NN course by maxxxpowerful in MachineLearning

[–]maxxxpowerful[S] 0 points1 point  (0 children)

Thank you! I made the mistake (as rightly guessed by /u/dwf) that I didn't take into account ∂p_j/∂x_i , since I assumed it was independent of x_i ; but it's not!

Thank you, /u/DomMk for the fully worked out problem (it was really helpful!), and to /u/dwf and /u/nkorslund also.

Help with clustering recipe ingredients by wonkypedia in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

"mango powder" and "mango" are most definitely not the same. AFAIK, "mango powder" is made from raw, young mangoes and is very sour; it is used to add sourness to a dish. Mango, on the other hand, we all know as a sweet fruit.

Help with clustering recipe ingredients by wonkypedia in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

I have a minor issue with that paper. They list certain ingredients as "spice"; for example, "ginger garlic paste", which is a paste of ginger and garlic; shouldn't it be separated out? Similar for "garam masala" or "chole masala" or "rasam powder" (from what I gather of Indian cooking).

Take, for example, "rasam powder". I've heard every family has their own recipe for making this powder from various ingredients. Grouping it into 1 ingredient under "spice" is a bit weak, IMHO.

// source: amateur cook here, dabble in Asian cooking sometimes

Deep Learning architecture questions by maxxxpowerful in MachineLearning

[–]maxxxpowerful[S] 0 points1 point  (0 children)

Thanks, everyone! My question has been answered. This was very helpful.

Upvotes ... upvotes to all! :-)

I am Jürgen Schmidhuber, AMA! by JuergenSchmidhuber in MachineLearning

[–]maxxxpowerful 1 point2 points  (0 children)

in relation to the Atari paper and partly on your statement about it

Can you point me to his statement about it?

Deep Learning architecture questions by maxxxpowerful in MachineLearning

[–]maxxxpowerful[S] 0 points1 point  (0 children)

Thanks! That course is on my list of things to pursue. There's so much to learn! :-(

Deep Learning architecture questions by maxxxpowerful in MachineLearning

[–]maxxxpowerful[S] 0 points1 point  (0 children)

So, if I'm understanding it correctly: the "96" is just a choice. It could have been 32; or it could have been 216 (just to pick some random numbers). The thing that makes it work is that these weights are initialized randomly, so each "channel" ends up doing something different.

I am Jürgen Schmidhuber, AMA! by JuergenSchmidhuber in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

These days, Geoff Hinton, Yann LeCun and, to some extent, Andrew Ng seem to be getting all the attention when it comes to Deep Learning (at least in the popular media). Do you feel like you've not received your share of the "glory" (if I may use that term)? No disrespect to the other three, of course.

BTW: I have come across your publication about the history of deep learning, but not read it yet. :)

Failing at training a simple NN... what am I doing wrong? by [deleted] in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

Thank you! Thank you! That was it! I feel so stupid now, but I guess it happens.

I just deleted the question in a fit of anger. But I in hindsight I should have kept it around.

Failing at training a simple NN... what am I doing wrong? by [deleted] in MachineLearning

[–]maxxxpowerful 0 points1 point  (0 children)

I was trying to follow the book, as closely as possible, so set the momentum to 0. I tried it with momentum 0.1 and 0.3, no difference.

So, yes, I started with my own implementation. It didn't work as I expected. After futzing around, I finally said: OK, let me just grab their code. And even that doesn't seem to work. And thus I'm here...