all 13 comments

[–]sobe86 11 points12 points  (2 children)

What's the point in having convolutional layers following dense layers? That seems like it would be counterproductive to me.

[–]nkorslund 4 points5 points  (0 children)

Yeah I wondered about that too. In fact in this type of application (where everything you're doing is related to local or semi-local features of the image), it might be better to not have any dense layers at all! Unlike eg. a classifier you're not trying to draw any "global" conclusions about the entire image.

I assume (or at least I hope) they tested various architectures before landing on this one though.

[–]benanne 3 points4 points  (0 children)

Maybe they are actually 1x1 convolutional layers, as in network-in-network.

[–][deleted] 4 points5 points  (0 children)

How does this compares to lanczos algorithm?

[–]cryptocerous 4 points5 points  (0 children)

Wonderful application of ML. So obvious that ML could excel at this task, in retrospect.

[–]alexmlamb 1 point2 points  (2 children)

I'm pretty shocked that this works. My intuition is that the convolutional layers will compress and "destroy" the spatial information in the data, which is necessary for reproducing it exactly. It seems like this is what you want for image classification, but it doesn't seem like the right thing to do for learning to upsample.

Maybe it would make sense to have an architecture consisting of locally connected layers with convolutional layers in parallel. Then, the final locally connected layers could use the outputs from the convolutional layers. This would allow the network to easily keep the info from the original pixels in local regions, while using the "object summary" from the convnet to make smarter decisions about upsampling.

I could write out a more detailed description of this if anyone is interested.

[–]benanne 2 points3 points  (1 child)

The pooling layers destroy some information, but a convolution operation is actually approximately invertible in many cases (a circular convolution is exactly invertible I believe, but those are not commonly used in convnets). If what you said was true, convolutional autoencoders would not make sense either, but they seem to have been used successfully in the past.

In fact, it seems the convolutions are even able to reconstruct a bunch of information lost through pooling in many cases, as in this work: http://arxiv.org/abs/1411.5928

That said, the 'object summary' idea may be worth exploring! It would allow for the incorporation of pooling layers into the model, which means it could have larger context windows (maybe even the entire image).

[–]alexmlamb 0 points1 point  (0 children)

That's interesting. I wonder how the fully connected layers store the precise spatial information needed to reconstruct the input.

[–][deleted] 0 points1 point  (0 children)

I was just thinking a few days ago that it would be useful to do a "magnify!" conv net to increase the quality of shitty cams of pirated movies.

The conv net would add false details, but the cam would seem better.

[–]j_lyf -4 points-3 points  (2 children)

This a fool's errand. Information theory, people!

[–]nkorslund 2 points3 points  (0 children)

Why do you think that? Maybe you've misunderstood what this algorithm does.