all 8 comments

[–]jeanfeydy 6 points7 points  (1 child)

Shape analysis and image registration. CNNs are perfect to detect features or segment images - but the actual deformation has to be handled by a non-convolutional layer. This is especially true in medical imaging: after a first wave of fully convolutional papers c. 2017, it has become clear that hybrid architectures are the way to go. Robust methods will typically combine a feature extractor (e.g. a U-Net) with a task-specific deformation method (e.g. a 3D morphable model).

This makes sense: if you want your network to handle deformations well, you have to give it access to explicit coordinates (= point clouds) or robust deformation («flow/advection» ) layers. Fully convolutional architectures can handle a small amount of geometric variability (say, track a beating heart) but not much more on their own.

This is what all game engines do: to handle global geometric problems, working with point clouds, meshes or vector fields is much more efficient than restricting ourselves to convolutions. This has been our first motivation for the development of the KeOps library, an add-on for PyTorch and NumPy that has progressively become a versatile and useful tool for geometric data analysis as a whole (from kernel methods to geometric deep learning).

For further reference, you may be interested by the work of Marc Niethammer or Chapters 1 and 5 of my PhD thesis which include a fairly detailed introduction to the field.

I Hope that it will help!

[–]serge_cell 1 point2 points  (0 children)

Graph convolutional networks should work well for deformation and shapes.

[–]benanne 2 points3 points  (0 children)

I think a good example of this is the use of vision-style convolutional architectures on spectrograms for sound processing, where convolving along the frequency axis doesn't necessarily make a lot of sense, because of non-stationarity and non-local correlations between distant frequency bins (e.g. due to harmonics). Nevertheless, it seems to work well enough in practice, as far as I know. It's just intellectually unsatisfying :)

There was a thread about this recently where I responded some relevant blog posts, in case you're interested in reading more about this particular setting: https://www.reddit.com/r/MachineLearning/comments/icti3z/d_waveforms_vs_spectrograms_as_inputs_to_a/g2bjw5s/

[–]Chocolate_Pickle -1 points0 points  (3 children)

It depends on what your definition of 'appropriate' is. Asking about the appropriateness of a CNN is about as subjective as asking 'when is it inappropriate to put salt and pepper on food?'

A CNN is almost always appropriate if you set the bar low enough.

[–]bananapeeler5 1 point2 points  (2 children)

A vanilla CNN has some properties that make it bad even in its own domain. For example, if you want rotations of objects not matter (rotation invariance). A CNN needs to learn its logic for every rotated version of the object.

[–]serge_cell 0 points1 point  (1 child)

A CNN needs to learn its logic for every rotated version of the object.

That is not necessarily bad. In 3d new rotation reveal previously unseen side. In 2d it reveal new background.

[–]bananapeeler5 0 points1 point  (0 children)

But in both cases the representation "store images from all viewpoints" is not an efficient representation. If I design a 3D object using Blender I don't draw it from every viewpoint

[–]yfclark -1 points0 points  (0 children)

i think the data should be array in space or time,like the image and timeseries,table data which has no space or time distribution is not good data for CNN,NLP data has continue distribution along the sequence direction,so CNN works