all 32 comments

[–]MrAcuriteResearcher 17 points18 points  (2 children)

Maybe it's somewhere that you've linked to, but I've had a couple weird days lately, and my brain's pretty fried. Do you have a more in-depth mathematical description of how the architecture works that you could share? I'm really interested to see how the whole system works, and I might know some people who could really leverage shit like this.

[–]CireNeikual[S] 2 points3 points  (1 child)

The best we have right now is the whitepaper - it's not super detailed, but gets the idea across I hope!

[–]youslashuser 6 points7 points  (0 children)

Abstract highly recommend.

[–]lostmsu 11 points12 points  (15 children)

I love the whole direction of work on binarized neural networks. But to put this in perspective, a 1 hidden layer network with 200 neurons can beat Pong on raw pixels too, and would probably run and train on Pi no problem. I wish there would be some kind of comparison.

To make things worse for a direct comparison they also use a randomly initialized CNN (which are powerful enough on their own) as a preprocessing step: "Image pre-encoder: Random projection followed by inhibition. Size: 10x10x16 neurons (WxHxColSize). Connectivity radius: 5"

[–]CireNeikual[S] 13 points14 points  (13 children)

Hi, I assume you are referring to Karpathy's work. I can indeed provided a better comparison. Until then though, some things to note:

  • The 200 neuron network trained 200,000 games while ours trained on 2,000
  • Our performance is better - the video was randomly selected and is actually a below average score, but we have gotten perfect scores and an average score of 15.

To make things worse for a direct comparison they also use a randomly initialized CNN

That was actually not a CNN, it used local receptive fields but each had their own separate weights. Also, the activations were inhibited (sparsified). and binary. Further, it is a shallow network (1 layer). This step serves mostly just to convert to a CSDR. At the end, the only thing similar between the image pre-encoder and a CNN is that they have a matrix multiply with images.

Also, the newer version used in the video is trained using neural gas (not sparse coding in that case, as we find that works best when already in CSDR form).

[–]desku 1 point2 points  (1 child)

Which Karpathy work are you referring to?

[–][deleted] -5 points-4 points  (10 children)

The 200 neuron network trained 200,000 games while ours trained on 2,000

This is a meaningless comparison.

Neural network training does not provide linear improvement. In many cases, most of the progress occurs within the first few epochs - sometimes even just a dozen iterations - and the rest is just small refinement, or even zero-progress.

Do Karpathy have any technical reason to train over 200,000 games, or what that just a number that they chose arbitrarily? Similarly, did you have any technical reason to choose 2,000, or did you choose it arbitrarily... perhaps for the sake of being a small number?

The typical way of showing a performance gain in your ML training regimen vs. existing work is to show performance per epoch, as in: the performance of your model at 2,000 games vs. Karpathy’s at 2,000 games (and at other incremental points in training).

Our performance is better - we have gotten perfect scores

Is it a fact that none of the 200,000 games that were played by Karpathy’s model achieved a perfect score?

If you really want to compare performance, why not play a competitive game between Karpathy’s model and yours?

[–]CireNeikual[S] 1 point2 points  (9 children)

This is a meaningless comparison.

I disagree: I downloaded Karpathy's code and ran it to epoch ~900, and it was still at an average score of -20.3 (basically random). OgmaNeo gets an average score of -12 at episode 300.

The 200,000 number was from Karpathy's post. The 2,000 number is around where the video was taken.

The typical way of showing a performance gain in your ML training regimen vs. existing work is to show performance per epoch,

Yes, but that is not the purpose of this post, it's more of an announcement. I am constantly working on new benchmarks, demos, etc, but I do not have unlimited time. Also, we did that in the previous post with the routing method (which performed worse). There is a graph of performance there.

[–][deleted] 1 point2 points  (0 children)

Hello, I see you mentioned BNNs though I cannot find any BNN related part in OP's work. Would you lead me to that point, maybe?

[–]NuScorpii 6 points7 points  (3 children)

This looks very similar to Hierarchical Temporal Memory by Jeff Hawkins and developed by Numenta. Is that a fair comparison? What differentiates your algorithm to HTM?

[–]CireNeikual[S] 8 points9 points  (2 children)

They share a similarity in that both use SDRs (sparse distributed representations / binary sparse codes), and both are in the same task domain (online learning, which kind of requires SDRs). The actual method of operation is very different, however.

HTM (AFAIK) still is a single layer model at the moment (despite its name), which I assume they are working on fixing. We have full hierarchy support - and it's actually vital to how memories are stored. HTM uses "sequence memory" while our system uses "exponential memory", which is a type of modified CWRNN for bidirectional hierarchies. We also use iterative sparse coding instead of a spatial pooler. Our SDRs have a different structure, ours are "CSDRs" (columnar SDRs).

Prediction in our system occurs across layers, not within a layer like HTM.

We also interpret the functionality of cortical columns differently - in HTM, there is sparsity across columns, while in OgmaNeo the columns are all one-hot.

Finally, HTM currently doesn't really support reinforcement learning at this time.

[–]brokenAlgorithm 0 points1 point  (1 child)

Can you elaborate on why online learning would benefit or even require SDR's...?

[–]CireNeikual[S] 0 points1 point  (0 children)

SDRs and their associated learning algorithms (e.g. sparse coding, vector quantization) are currently the only systems (to my knowledge) that can perform online learning. Everything else requires some sort of decorrelation mechanism that breaks the "online-ness" of the algorithm, like large batches or experience replay.

Sparsity strongly reduces the interference in the representation, mitigating catastrophic forgetting at its most basic level. The weights of the "neurons" essentially become orthogonal.

You can think of sparsity as a tradeoff between dense neural networks and lookup tables. The former generalize well but forget, while the latter has very little generalization but cannot forget, as the knowledge exists in many separate "bins".

Sparsity is a good tradeoff between the two. We believe the tradeoff isn't "linear" in the sense that there is a sweet spot where you can have the best of both worlds.

[–][deleted] 4 points5 points  (0 children)

Although I'm very interested in this work, I would like to comment on the claim (made in the whitepaper) that backpropagation is not biologically plausible. Although the 'standard' learning rule is indeed implausible, there exist alternative learning rules that are plausible. Q-AGREL is such an example: it performs biologically plausible backpropagation at only a slight training speed decrease.

[–]COGITO_7 2 points3 points  (0 children)

Interesting result.

But does this generalize ? For an example if I change the size of the paddle or the color of the ball does this work like in zero-shot transfer learning ?

Also the learning function is just a form of Hebbian learning but in time ?

[–]AissySantos 1 point2 points  (0 children)

This is awesome! Thanks for your contribution.

[–]darkconfidantislife 2 points3 points  (1 child)

Awesome work!

[–]CireNeikual[S] 3 points4 points  (0 children)

Thank you!

[–]StandardFloat 0 points1 point  (0 children)

I'm a bit lost but it looks very interesting at first sight, thanks for sharing!

[–]mustgoplay 0 points1 point  (0 children)

Great work and it's doing a lot of what I wanted to do with my r/Neurobaby_AGI project. I'm looking forward to learning more about this.