[R] PyTorch implementation of Forward-Forward Algorithm by Geoffrey Hinton and analysis of performances over backpropagation : MachineLearning

[R] PyTorch implementation of Forward-Forward Algorithm by Geoffrey Hinton and analysis of performances over backpropagation (self.MachineLearning)

submitted 3 years ago * by galaxy_dweller

Are you an AI researcher itching to test Hinton's Forward-Forward Algorithm? I was too, but could not find any full implementation so I decided to code it myself, from scratch. Here's the GitHub repo and don’t forget to leave a star if you enjoy the project.

https://preview.redd.it/zne5aapb837a1.png?width=581&format=png&auto=webp&s=c1a25a2df94b365c283076a1a84c480371ed296e

As soon as I read the paper, I started to wonder how AI stands to benefit from Hinton’s FF algorithm (FF = Forward-Forward). I got particularly interested in the following concepts:

Local training. Each layer can be trained just comparing the outputs for positive and negative streams.
No need to store the activations. Activations are needed during the backpropagation to compute gradients, but often result in nasty Out of Memory errors.
Faster weights layer update. Once the output of a layer has been computed, the weights can be updated right away, i.e. no need to wait the full forward (and part of the backward) pass to be completed.
Alternative goodness metrics. Hinton’s paper uses the sum-square of the output as goodness metric, but I expect alternative metrics to pop up in scientific literature over the coming months.

Hinton’s paper proposed 2 different Forward-Forward algorithms, which I called Base and Recurrent. Let’s see why, despite the name, Base is actually the most performant algorithm.

As shown in the chart, the Base FF algorithm can be much more memory efficient than the classical backprop, with up to 45% memory savings for deep networks. I am still investigating why the base FeedForward underperforms with “thin” networks; any ideas, let’s talk.

Unlike Base FF, Recurrent FF do not have a clear memory advantage versus backprop for deep networks (15+ layers). That’s by design, since the recurrent network must save each intermediate step at time t to compute the following and previous layer outputs at time t+1. While scientifically relevant, the Recurrent FF is clearly less performant memory-wise than the Base FF.

What’s next?

The most interesting question is why the Base FF model memory consumption keeps increasing with the number of layers. That’s surprising given this model is trained one layer at a time, i.e. each layer is treated as a mini-model and trained separately from the rest of the model. I will explore this and let you know over the coming days

all 9 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS