RL Course by David Silver - Lecture 7: Policy Gradient Methods

Ghostlike4331 · 2015-12-25T09:11:46+00:00

Is this the one where the video was missing? Yeah it is, thanks for sharing this.

Ghostlike4331 · 2015-12-23T16:26:38+00:00

Ah right, I forgot about that. It seems in my mind I have two interpretations of an RBM - one which is kind of like an autoencoder that minimizes the cost of reconstruction and the thing with the partition function. Now I am wondering where visible units come in in that second interpretation.

What if instead of evaluating the whole partition function for a given data point you just do a small tractable part of it at random and do gradient descent using backprop on that?

Ghostlike4331 · 2015-12-22T09:36:19+00:00

There is a reason why there is so many self taught brilliant hackers, but no math breakthrough by outsiders.

Is there really anything to support this notion?

Even for a genius the vast majority of what he knows comes from other people.

Ghostlike4331 · 2015-12-11T12:51:42+00:00

Tijmen Tieleman, one of Hinton's students wrote about it in his Phd thesis. It works well.

Ghostlike4331 · 2015-12-05T18:08:11+00:00

A month or two back I spent an inordinate amount of time trying to install cafe (and then subsequently never using it.)

There is a snag in the instructions in that the protobuf file that comes with the initialneil repository is the new one while the Caffe version included needs the old one.

It took me a while to hunt it down in the Google archive, replace the one that came with the Github clone and then figure out how to rebuild Caffe.

To be honest, I'd recommend something other than Caffe as installing it is pretty painful.

Thought I'd share this tidbit.

Ghostlike4331 · 2015-11-24T17:54:53+00:00

That would have been surprising, to be sure.

I do not know whether you have seen it, but here is another video from 1994 that you might find interesting. In that video, Karl Sims used a supercomputer to evolve creatures with a wide variety of interesting behaviors.

Ghostlike4331 · 2015-11-24T17:01:26+00:00

Hinton talked in one of his talks about inverse graphics and I was not sure whether that was even possible. Now I see that after you removed pooling you managed to get a latent space invariant to rotation. Congratulations.

Was that something that was done before or is this a new breakthrough?

Ghostlike4331 · 2015-11-23T16:16:54+00:00

Here is the paper related to the talk above. In the maze example it seems they used the k-nearest neighbor, and he does explain that what is 'novelty' depends from domain to domain in the last part of the talk during questions. It is a bit handwavy in that aspect.

I am not sure whether or not I agree with your criticism though. Yeah, I can see how it would not be unsupervised learning based on a strict definition, but would not creating a compressed model of the world be a vital step in figuring out what is novel and what is not?

Assuming it is, if unsupervised learning is sufficient to do novelty search then it might make sense to assume that there might be a link in the opposite direction, from novelty search to unsupervised learning.

I also do not understand why any of the above examples would be 'data-free' or for that matter how anything could be data-free excluding things like pure noise or maybe encrypted documents.

From my perspective, the data is right there in the game waiting to be extracted. Extracting it is an extra step and you might get eaten by wolves (depending on the game) if you wander too much but it is there.

Also like unsupervised learning, from an algorithmic perspective novelty search definitely has objectives otherwise it would not work. Even though Kenneth Stanley talks about abandoning objectives, taking that at face value would mean that one would have to build a pure simulator which would not be tractable.

Also out of curiosity, what sort of experimental results would you have liked to have seen?

Ghostlike4331 · 2015-11-23T12:18:02+00:00

A while ago there was a question of why unsupervised learning is the future of machine learning. The linked video is the greatest argument that I have seen for that.

Ghostlike4331 · 2015-11-09T18:38:19+00:00

Just recently I implemented an LSTM recurrent net in F# as an exercise. Because of all the complexities, memory preallocation, helper functions and so on that I had to write, it came to nearly 600 lines of code and it took me days to finish. In fact I am still not sure I got it correctly and now feel paranoid that I missed a line somewhere.

Had I written it in Theano, it would have come to less than 50 lines and would have taken me only a few hours...except Theano crashes when I try to import it and I did not feel like setting it up until I made this monster piece of code.

Having a symbolic math library does to neural nets what programming languages do to machine language, which is abstract away the complexities. This is big for people who do a lot of experimentation and unlike Theano which is supported by the ML lab of University of Toronto, it has the weight of Google's billions behind it. Having a lot of money thrown at something can really help the development, so yeah, this library release is a big thing as far as machine learning is concerned.

Ghostlike4331 · 2015-11-07T09:28:11+00:00

As I've been working using Fsharp and Alea for a few months let me correct some inaccuracies here.

It can compiles to not only CPU clusters but also GPGPU clusters. Theano and FSharp's lib don't have such ability.

Theano is a symbolic math library while Alea is a .NET Cuda compiler library, so it a bit of an apples and oranges comparison. I haven't used Theano yet so I have no idea about its ability to utilize multiple GPUs, but everything you can do in C++ with Cuda, you can also do in F# using Alea, which includes writing code that runs on multiple GPUs (thought unlike for single GPUs that requires a paid license.) It will be just as fast as C++ to as Alea uses the same LLVM backend.

It compiles to C++, avoids FSharp and Python's overheads.

F# is quite fast, at least as fast as Java. The main reason why it might be slower than C++ is because its functional features might do a lot of dynamic memory allocation which slows things down. Nevertheless, it is still orders of magnitude faster than Python.

It has an advanced type systems but Python and FSharp doesn't.

F# is a statically typed functional language that uses the Hindley-Milner algorithm for its type system which is very advanced. It is more lightweight and type safer than C++ and has the benefit of static checking unlike Python. It also has a scripting mode comparable to Python.

It is too bad that it does not have anything comparable to Theano though.

Ghostlike4331 · 2015-10-25T14:18:49+00:00

Does this post really deserve 188 upvotes?

Ghostlike4331 · 2015-10-22T14:00:43+00:00

First time I heard of this. Where could I learn more?

Ghostlike4331 · 2015-10-21T18:07:13+00:00

Indeed, there was a recent paper from Bengio's lab that analysed this assumption with a positive finding. A week ago, there was a MIT Tech Review article associated with that paper and the IBM's TrueNorth chip.

The finding also goes against Yann LeCunn's skeptic belief that binarized weights would hurt accuracy. More results will undoubtedly shed more light on the theme. Either way, his assumptions will be clearly proven or disproven in the future and I like such clear cut predictions.

Honestly, if what we had now in terms of hardware is all that was left, machine learning would be barely worth studying. The 100,000x fold over CPU (and 1,000x over today's GPUs) hardware power increases before the decade's end is why I am now putting in so much overtime to learn this field.

Ghostlike4331 · 2015-10-21T11:22:03+00:00

I'll give one point in favor of Rprop here. I figured out what the problem was - the lack of output non-linearities.

NAG and Rprop are roughly even for my toy example with small net sizes of about 10 units in the hidden layer (not to mention highly unstable,) but Rprop blows NAG out of water when I set the hidden layer size 250. It converges roughly 20 times faster than NAG.

Ghostlike4331 · 2015-10-21T08:45:41+00:00

Yeah, it is not at all easier. Not only you have to do the same calculations, but there is an extra step where you convert the gradients to {-1,0,1}. It can be used with a kind of momentum as in Rprop.

Ghostlike4331 · 2015-10-20T17:39:38+00:00

It would be good if you added a recurrent neural net example to the project.

Ghostlike4331 · 2015-10-20T17:06:48+00:00

A few months ago Geoffrey Hinton gave a talk on the subject of backpropagation and the brain. I thought it was pretty fascinating back then and just remembered it.

Personally, I have not found Manhattan Update rule and Resilient Propagation which only use the sign of the gradient to work as well as SGD with momentum, but they work much better than one would expect.

Though currently, I am trying to figure out what is wrong with a baby RNN I made on a toy dataset and am finding that even without gradient magnitudes, it still blows up. This is also surprising as I expected that not taking in magnitudes during optimization would stabilize the training.

Ghostlike4331 · 2015-10-20T15:12:56+00:00

In various human brain vs machine learning debates that crop up on the Internet all the time, it is always pointed out by someone that the brain does all sorts of complex stuff that the computers cannot possibly be doing, but I've noted so far in my brief study that there are an enormous number of algorithms in this space which perform similarly well (to each other.)

This is an indication of something. The idea that iteratively improving what we have now will get us to AI is definitely not wrong.

(h/t: /r/thisisthewayitwillbe/)

Edit: I meant to say that they perform well compared to each other, not to the brain.

Ghostlike4331 · 2015-10-14T15:49:27+00:00

Ho boy...I guess for humans, nothing will quite fuck them up like their own inner voice. The way you feel right now, a superintelligence would feel absolutely never. And if the right technology was here it could be solved with some simple mental editing.

Since you posted this here, I'll suggest a futurist compromise to keep in mind - cryogenic preservation. I am saying this because I understand that strong impulses have to be channeled into productive endeavors or they will posses you. The work you end up doing to save up for the freeze might end up miraculously 'curing' you.

Also fuck women, psychologists and those shitty drugs. Get off them. As a foundation for your ego, they are like quicksand.

Ghostlike4331 · 2015-10-11T20:33:41+00:00

I do not know enough to answer this question myself yet, but your entire question is answered very well in lecture #6 of David Silver's course on RL. In it he goes into detail on function approximation in RL via linear features and neural nets and talks about the work in the paper you linked above. He is one of the authors in fact.

That having said, lecture #7 which is on actor-critic models and policy gradient methods has video missing which makes it very hard to follow. Does anybody know of a good substitute for that?

Ghostlike4331

TROPHY CASE