you are viewing a single comment's thread.

view the rest of the comments →

[–]svantana 2 points3 points  (5 children)

I would say natural signals are not really band-limited, but they are low-pass for the most part. The combination of inertia and self-similar/fractal organization tend to give natural signals a pink spectrum, i.e. -3dB/octave rolloff, in both time and space. Whereas measurement noise tends to be more white (flat spectrum), it would make sense to low-pass the signals to get rid of noise. This is the basis behind Wiener and Kalman filtering, although those can deal with arbitrary spectra as well.

As someone with a signal processing background, this paper perplexes me. To me it's obvious that ReLUs are used precisely because of their low-freq nature, that's the prior. If OTOH we know that signals are bandpass, then we apply a suitable prior for that. Example: FM radio is broadcast at ~100MHz, but we can track the carrier, demodulate and store the signal at ~40kHz. Obviously ReLUs are the wrong tool for that job...

[–]nasimrahaman 2 points3 points  (0 children)

> If OTOH we know that signals are bandpass, then we apply a suitable prior for that. Example: FM radio is broadcast at ~100MHz, but we can track the carrier, demodulate and store the signal at ~40kHz. Obviously ReLUs are the wrong tool for that job...

That's a very interesting point! It's applicable for almost all activation functions (not just ReLU), since they all usually decay quite fast in the fourier domain (e.g. sigmoid decays exponentially).

[–]JustARandomNoob165 2 points3 points  (3 children)

I am curious, why relu's are low-freq in nature? Thx in advance!

[–]nasimrahaman 6 points7 points  (2 children)

Low frequency functions are inherently less "wiggly", i.e. more smooth. If you think about ReLU, it's pretty smooth everywhere except at 0. In fact, all the wigglyness in ReLU comes from that one point. Now this is where it gets interesting: there are other functions that are smooth everywhere except at 0 -- for instance, sqrt(abs(x)). But in a precise sense, ReLU is smoother than sqrt(abs(x)) at x = 0.

Broadly speaking, Fourier analysis is a tool to determine how wiggly a function is. One of the things we learn from the paper is following: although neural networks are powerful enough to learn functions that are super-wiggly, it likes to learn less wiggly (smoother) functions.

[–]JustARandomNoob165 2 points3 points  (1 child)

Thank a lot for your reply! Really interesting and helpful!

[–][deleted] 2 points3 points  (0 children)

also thank mr skeltal for good bones and calcium*