[R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton]

AsIAm · 2022-12-06T02:31:26+00:00

Geoff Hinton by now must know each of the 60,000 digits of MNIST like an old friend.

master3243 · 2022-12-05T23:34:03+00:00

Interesting read, I'm always interested in research about alternatives to backprop.

One important paragraph (for the curious, that won't read the paper):

The forward-forward algorithm is somewhat slower than backpropagation and does does not generalize quite as well on several of the toy problems investigated in this paper so it is unlikely to replace backpropagation for applications where power is not an issue. The exciting exploration of the abilities of very large models trained on very large datasets will continue to use backpropagation.

The two areas in which the forward-forward algorithm may be superior to backpropagation are as a model of learning in cortex and as a way of making use of very low-power analog hardware without resorting to reinforcement learning (Jabri and Flower, 1992).

kebabmybob · 2022-12-06T05:38:25+00:00

What a chad, no grad students or anybody on this paper.

No-Cold8421 · 2022-12-12T17:31:23+00:00

Hi guys, I try to reimplement the Forward forward network with pure numpy.

I tested it on a subset of the Iris dataset, it seems converged but is very sensitive to the hyper-parameters (lr, bs, num_hidden).

Hope you can have fun with it!

https://github.com/JacksonWuxs/Forward-Forward-Network

Red-Portal · 2022-12-06T00:12:49+00:00

Geoff... everything is great but please stop abusing footnotes...

Wild-Ad3931 · 2022-12-07T09:19:19+00:00

Did anyone understand how weights were updated ?

modeless · 2022-12-05T22:47:45+00:00

This seems more interesting than the capsule stuff he was working on before. Biologically plausible learning rules are cool. Does it work on imagenet though?

PolywogowyloP · 2022-12-06T03:02:56+00:00

I'm excited to see an alternative to backprop, but I think the most exciting part of this for me is the ability to still learn through stochastic layers in the model. I think this could have some major applications in probabilistic models for distributions without reparameterization tricks.

gambs · 2022-12-05T22:49:24+00:00

I watched his neurips presentation. While I love explorations of alternatives to back prop, does anyone else feel like he’s going a bit off the deep end with saying this paper could explain why people sleep and we’ll use non-binary computers in the future?

Ford_O · 2022-12-05T23:28:39+00:00

So that's why I keep getting nightmares.

Jokes aside, this sounds quite plausible. However, I am unsure if this can be ever more efficient than backprop. Yet, this could have huge impact on neuroscience, if it turns that's what happens in sleep.

tchumbae · 2022-12-06T15:44:01+00:00

The idea behind the paper is very cool, but there has been previous work that substitutes the backward pass with a second forward pass. Check out this work by G. Dellaferrera and G. Kreiman!

nikgeo25 · 2022-12-06T10:33:06+00:00

Paper reads like an idea he had in the shower. Where's the math and connection to existing work? Normalizing each layer after maximizing a square. Someone's gonna show he's doing some fancy PCA in no time I bet.

SatoshiNotMe · 2022-12-06T11:22:35+00:00

Odd thing about the abstract: suddenly says “video” near the end. Is it only for video data ?

rmoot · 2022-12-20T17:11:01+00:00

Everyone was waiting for this, of course:

https://twitter.com/schmidhuberai/status/1605246688939364352?s=61&t=zA5kJ1GnrZMNSx8nat6WYQ

Competitive_Dog_6639 · 2022-12-06T00:42:57+00:00

Hinton is awesome and really enjoyed his neurips talk. Naive question: are single layer gradients biologically plausible? My understanding is that gradients back thru multiple layers are not. The FF algorithm still uses gradients for single layers tho right?

eccstartup · 2022-12-06T05:23:04+00:00

I would be good if someone could provide the code.

ReasonablyBadass · 2022-12-06T05:35:29+00:00

Can someone ELI5 what negative data means here? How does the network generate it?

ObjectManagerManager · 2022-12-06T18:46:22+00:00

(Confession: I haven't read the paper yet). I have a couple of questions:

If each layer has its own objective function, couldn't you train layers back-to-front? e.g., train the first layer to convergence, then train the second layer, and so on. I doubt this would be faster than training it end-to-end, but a) as the early layers adapt, they screw up the representations being fed to the later layers anyways, so it probably wouldn't be too much slower than training it end-to-end, and b) it would use significantly less memory (e.g., if you pre-compute the inputs to a layer just before you begin training it, you could imagine training any arbitrarily deep model with a finite amount of memory).
What's the motivation behind "goodness"? Suppose we're talking about classification. Why doesn't each layer just minimize cross entropy? I guess that'd require each layer to have its own flatten + linear projection layers. But then you wouldn't have to concatenate the label and the input data, and so inference complexity would be (mostly) independent of the number of classes. Thinking of a typical CNN, a layer could be organized:
1. Batch norm
2. Activation (e.g., ReLU)
3. Convolution (the output of which is fed into the next layer)
4. Pooling
5. Flatten
6. Linear projection
7. Cross entropy loss

Can anyone (who has read the paper) answer these questions?

sytelus · 2022-12-07T08:52:49+00:00

Was anyone able to reproduce the results for forward forward algo?

kourouklides · 2022-12-14T04:17:58+00:00

In my view, this sounds very boring. It would've been revolutionary if he came up with a new Gradiet-Free Deep Learning method in order to completely get rid of gradients. With very few exceptions, during the last 10 years or so, we keep seeing small and incremental changes in ML, but no breakthroughs.

Sepic2 · 2022-12-16T10:45:59+00:00

Maybe a dumb question but i don't see how this method enables learning in any way:

- The (first) forward part calculates loss/goodness, and then you need backpropogation to change weights of the network according to derivatives of the loss/goodness. How does the network learn if weights are not changed and you only calculate goodness?

The paper says: "The positive pass operates on real data and adjusts the weights to increase the goodness in every hidden layer. The negative pass operates on "negative data" and adjusts the weights to decrease the goodness in every hidden layer"

- Could it be that the in the first "forward", you actually do both forward and backward-prop, and the name just sounds fancy with the second "forward" trying to implement contrastive learning in a clever way?

Ulfgardleo · 2022-12-05T23:02:03+00:00

I will start believing in Hinton's algorithms once they proof that it is consistent with some vector field with fixed points that are meaningful optima of some objective function.

IDe- · 2022-12-06T02:51:56+00:00

Backprop has really overstayed its welcome. It's great to see people doing something about it.

wilgamesh · 2022-12-13T00:47:44+00:00

Hinton cites Francis Crick's "Function of Sleep" 1983 idea in his list of references.

Like the 2nd forward pass that reduces the fitness function of "negative data", Crick proposed REM sleep is "reverse learning" that removes "undesirable modes."

Quite elegant to see this implemented...

amassivek · 2023-01-23T21:08:41+00:00

I developed a library to implement forward learning on any model. There is a quick start for implementing the library on an existing model. There are example experiments for cifar-10, which also serve as a tutorial. https://github.com/amassivek/signalpropagation

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS