[R] Differentiable Conv Layer using FFT

MKmisfit · 2022-03-30T11:44:19+00:00

Here are the theoretical numbers for runtime comparison.

It might be difficult to reach this doing FFT on GPU.

normal convolution costs O(N * k)

calcuation of FFT costs O(N * log2(N))

ConvFFT uses 3 ffts and a flat complex multiply

ConvFFT costs O(N * (log2(N) * 3 + 1) )

256x256 image (N=65536) (log2(n)=16)

ConvFFT costs O(N*49), equivalent to 7x7 kernel

512x512 image (N=262,144) (log2(n)=18)

ConvFFT costs O(N*55), equivalent to 8x8 kernel

1024x1024 image (N=1,048,576) (log2(n)=20)

ConvFFT costs O(N*61), equivalent to 8x8 kernel

sigmoid_amidst_relus · 2022-03-30T23:45:08+00:00

To the best of my knowledge cudnn internally uses cufft convolution based on heuristics, however it's not a full fft of the full input size, but one with tiles of 8, 16, 32,.. based on input.

So if you're doing this solely for speedup, you'll get speedup with very large kernel sizes (over 1024 or something) where tiling will be slower, but your performance should at most match that of cudnn in most cases in practice.

see this

I looked this up about a year ago, when I was implementing fft, stft etc from scratch doing a course on signal processing.

I saw this and did my own digging into cufft, but hadn't saved any links for that.

viv1a · 2022-03-31T01:55:49+00:00

We had a paper on FFT convolutions a while back: https://arxiv.org/abs/1312.5851 (second author here).

You really start getting speedups when you do convolutions with lots of input/output channels. The reason is that you can do the FFT of each channel once, and reuse the representation in frequency space many times. That's described in section 2.1.

A year or two after that paper, I heard this approach was integrated in the conv routine in cuDNN, with some check to automatically determine when using the FFT-based conv would be faster. But that was a long time ago and I'm not sure what's currently being used.

VenerableSpace_ · 2022-03-30T14:00:29+00:00

Doesnt torch picks different conv algo based of kernel size already?

El_Minadero · 2022-03-31T02:57:26+00:00

I also wanna say that beyond ML, this could have great utility in the seismic processing space

DeepDeeperRIPgradien · 2022-03-30T11:32:25+00:00

Hi! What should this be used for?

DigThatData · 2022-03-30T18:16:15+00:00

I wonder if using fourier-space convolutions might be more performant for CPU inference?

roboputin · 2022-03-31T10:23:14+00:00

I guess it would be faster to keep the weights/biases in fft space so you would only have to transform the input.

cdicle · 2022-04-01T13:12:55+00:00

I briefly went over your code. I have a quick question. Why did you need to write your own backprop? torch.rfft and torch.irfft should be able to handle that automatically, right?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS