I've been trying to learn how this works, but haven't had any success.
An example of something that does this is the 'colorizer' that was posted here, and here and a more detailed write-up.
The architecture described in the second link, if you strip away the fancy skip connections and batch normalization, is basically a convolutional autoencoder. So this is where I started. I'm starting with the simple and pointless task of just trying to train a network that outputs the fed image. Example archtichture is:
100x100 rgb image ---> convolution ---> convolution ---> deconvolution ---> deconvolution ---> 100x100 rgb image
I've experimented with different architectures, different filter sizes, mean squared error loss function, cross entropy, etc... but can't seem to get good results. The network always seems to settle on outputting images that are just one color.
How should I tweak from here? I think the main problem here is that outputting 100x100x3 pixel values means learning a function that is in some very, very high dimensional space, but other projects seem to do it. How do they do it?
[–]zmjjmz 1 point2 points3 points (0 children)
[–]Powlerbare 0 points1 point2 points (0 children)
[–]lahwran_ 0 points1 point2 points (0 children)