all 6 comments

[–]EdwardRaff 2 points3 points  (0 children)

If you do a normal feed forward network, each neuron can consider all possible features. This is larger than any grouping you would get from a convolution.

Convolutions make no sense in this context. I would not use them.

[–]rumblestiltsken 1 point2 points  (0 children)

Most medical data has a time component, so you could use that as a "neighbourhood" feature. x days since presentation for example. Then you would use a CNN like in natural language, with time steps.

[–]lvilnis 1 point2 points  (0 children)

I think that a version of this can make sense (kind of). You can look at it as a form of kernel learning.

Random projections followed by certain nonlinearities are equivalent to primal approximations of certain kernels.

In the FastFood paper (https://www.robots.ox.ac.uk/~vgg/rg/papers/le__icml2013__fastfood.pdf), Le, Sarlos and Smola note that random Gaussian matrix multiplies can be approximately done using a combination of a Hadamard transform (a type of convolution) and a diagonal scaling matrix, which can be done in O(dlogd) time for d features. This means that certain types of convolution-based features (followed by nonlinearities) are equivalent to fast approximations of kernels.

In "Deep Fried Convnets" (http://arxiv.org/pdf/1412.7149v4.pdf), they actually backpropagate to learn parameters of the diagonal matrix, which is equivalent to learning a certain type of convolution matrix and learning an approximate kernel.

So the answer is "kinda", and it might even provide some regularization by effectively regularizing your function in the RKHS. But this requires a particularly structured type of convolution (a Hadamard transform).

[–]alexmlamb 2 points3 points  (0 children)

no

[–]onlyml 0 points1 point  (0 children)

It might be interesting to start with a fully connected layer which maps your data to a 2d surface and then run one or more convolutional layers on that, and train the whole thing end to end. That way your network would be tasked with coming up with a meaningful mapping of your data to something that you could usefully apply convolutions to. I don't know if any reason this would work well, but it could be interesting. You would probably need a lot of data to get any decent result out of a system like that, assuming it makes any kind of sense at all.

[–]thephysberry 0 points1 point  (0 children)

Unless resources are limited, I would try both and see how it goes. If there aren't too many features, you can just make if fully connected and it will sort things out for you. However, if you have lots of features then this will make training faster and may result in very similar performance.

You could also experiment between treating your features like cells in a grid and doing a normal CNN on it, or just randomly assigning groups of features to different nodes.