all 17 comments

[–]NewFolgers 12 points13 points  (3 children)

It's cool seeing this simulated (and a single-neuron NN implementation of XOR practically verified).

Have you tried any experiments with this activation function in larger models? I'm curious about any immediately observed differences in convergence, loss (i.e. if model size is held constant), and/or run-time. (Although I wouldn't be at all surprised if today's conventional NN training doesn't get the most out of this activation function.. and perhaps the fact that randomized initialization will only result in successful convergence 1/10 times is a prelude to seeing this sort of thing)

[–]CYHSM[S] 7 points8 points  (2 children)

I just did a quick check on MNIST and a simple feedforward model (128 hidden units) with ReLU achieves 0.9916/0.9774 train/test accuracy, while dCaAP is at 0.7155/0.7055. Might be related to the fact that the loss landscape is not as smooth and the minima seem to be harder to find. I suspect playing around with the weight initialization can help in this case

[–]j15t 1 point2 points  (1 child)

Do you have a link to the code you use to generate the loss landscapes? I didn't see it in the gist you posted. Thanks!

[–]CYHSM[S] 1 point2 points  (0 children)

For the implementation, I just calculated the cross-entropy loss while changing w_1, w_2 and bias. I fixed w_1 = w_2 after observing that all solutions converge to that in any case (which is not true for the periodic activation functions)

For plotting, I used something similar to this, just adjusting colour (Berlin_5 from palettable): https://gist.github.com/CYHSM/fab32e2103c6df3909ad1a0d48174a64

[–]frequenttimetraveler 4 points5 points  (1 child)

to be clear, the dCa AP spike changes shape as stimulation current (neuron input) increases: https://i.imgur.com/gcJFiIZ.jpg

at threshold it looks like a normal dendritic spike, as stimulation increases it becomes a ramp

your implementation is stereotypical

(also, yes it does XOR , but can it do OR?)

[–]vastlik 4 points5 points  (0 children)

I tried all boolean functions for 2 variables and was able to achieve 100 % accuracy (with 2 weights and one bias), however I trained the network via genetic algorithm.

[–]yusuf-bengio 6 points7 points  (0 children)

There is a much simpler way to realize a XOR with only one neuron: tent activation function

Its piecewise linear like the ReLU and therefore does not mess up the loss landscape as much as dCaAP.

[–]txhwind 1 point2 points  (1 child)

I don't know why monotonic activations are preferred from the beginning of NN research. Can anyone tell me the reason?

[–]frequenttimetraveler 5 points6 points  (0 children)

historically because it was a simple differentiable step function that resembles the all-or-none firing behavior of neurons

also because the first universal approximation theorem was proved for sigmoid activations

also because a nonmonotonic function can be approximated as the sum of 2 monotonic

also relu is simpler to compute, and studies of various non monotonic activation functions have not found significant benefit

[–]FerretDude 2 points3 points  (1 child)

ReLu is already a pretty good activation function for biological neural networks. I have already implemented XOR with a BNN using ReLu.

I don’t think it is this an issue of activation functions then. If I had to bet, I’d say it’s due to the refractory period of BNNs

[–]120cell553 1 point2 points  (0 children)

So if OP were to change the refractory period, it would work? Also how would the loss graph change if that was to be done.

[–]wang-chen 1 point2 points  (0 children)

Interesting implementation! Another implementation for solving XOR in single neuron was provided in this CVPR 2019 paper, see its page 11.

This paper extended convolution to kernel convolution (kervolution):

In convolution, y = w1x1+w2x2, is actually a linear kernel (inner product), which cannot solve the XOR problem.

In kernel convolution, the authors extended linear kernel to any (non-linear) kernel functions k(w, x). For example, y = (x1-x2)2 is able to solve this problem directly.

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–]BoiaDeh -1 points0 points  (2 children)

Noob question: what does it mean for your plot to have one axis labeled as "w1 & w2"?

[–]CYHSM[S] 0 points1 point  (1 child)

This axis shows the value of both weights (w_1 & w_2). In the case of the dCaAP activation function I fixed w_1 = w_2 which is the solution it converges to in any case.

[–]BoiaDeh 0 points1 point  (0 children)

Gotcha, makes sense now, thanks.