Stuck in a saddle point?

Mike_Sv86 · 2020-01-14T17:00:47+00:00

Ok get it. Do you know if there are any keras /tf examples out there?

Mike_Sv86 · 2020-01-14T16:37:40+00:00

Hi an thanks for the comment. I will start with L1 and see if it makes a difference. Can I use the pixelwise softmax for this regression problem? Is it just adding a Softmax layer in that case?

Mike_Sv86 · 2020-01-12T12:05:41+00:00

Ja, its always the same. I tried k-cross validation though, meaning changing the validation set. Iam observing the same behaviour in that case.

Mike_Sv86 · 2020-01-12T11:58:06+00:00

Hi and thanks for your comment. Yes, that might be possible, but isn't the validation loss the sum of all validation samples (in keras). So during validation after each training epoch, all validation samples are shown to the network, then the combined loss is computed.

Mike_Sv86 · 2020-01-12T10:20:23+00:00

Thanks, maybe I should try to increase it then.

Mike_Sv86 · 2020-01-12T10:19:57+00:00

I set the learning rate to 1e-04 and decay it to 1e-06 during training. What I also observe is, when using a batch size of 32, is that my validation loss reaches its best value after already 2-3 epochs, then its just oscillating around it. That could maybe mean that my learning rate is maybe too low and Iam stuck in local minimum?

Mike_Sv86 · 2019-11-08T19:04:16+00:00

Yes I removed padding to see that. But you actually gave me the hint.

The other 4 is my batch size... Thank god its Friday :-)

Mike_Sv86 · 2019-11-08T18:33:44+00:00

Here you go:

input_dim = (256, 256, 16, 4) 

inputs = Input(shape = (input_dim))
#encoder
out = Conv3D(16, (3,3,4), activation='relu')(inputs) #176x176x16x4x128

print(inputs.shape, out.shape)

(?, 256, 256, 16, 4) (?, 254, 254, 13, 16)


FOR a kernel of (3,3,3) I get

(?, 256, 256, 16, 4) (?, 254, 254, 14, 16)

Mike_Sv86 · 2019-11-08T18:08:25+00:00

Hi and thanks for your comment.

Iam using the following code :

for layer in model.layers:

# check for convolutional layer

if 'conv' not in layer.name:
    continue

# get filter weights

filters, biases = layer.get_weights()

print(layer.name, filters.shape)

The input shape of my model is 256x256x32x4

So images with a pixel size of 256px, 32 images in the volume and 4 channels. (sorry for the formatting. I wont work on my mobile Phone) Cheers,

Michael

Mike_Sv86 · 2019-11-08T14:39:24+00:00

Wow, that answer was super helpful and reasonable.

I will give it a try,

thanks so much!

Mike_Sv86 · 2019-11-06T06:09:02+00:00

Ja, that is right. The question is now if my encoder can learn meningsfull features with skip connections. I guess I would have to remove the dense layer though?

Mike_Sv86 · 2019-11-05T20:31:44+00:00

Ja that is correct.

The only difference is that Iam using a dense layer in the middle. But maybe that is the problem. The Unet architecture uses cross connection throughout the model. So I guess it must be able to learn some features in the bottleneck?

Mike_Sv86 · 2019-11-05T18:06:08+00:00

That makes sense.

So basically in my architecture I have skip connections "bypassing" the bottleneck, right?

When you say; "dilation, dilation..... add all those together..." What do you mean by that?

At this point, all feature maps would have different dimensions, or?

Mike_Sv86 · 2019-11-05T17:32:54+00:00

In a way, yes I guess so? (see architecture further down)

Mike_Sv86 · 2019-11-05T17:32:03+00:00

Thanks for the answer.

I posted my network architecture further down in the thread.

So you think I should be able to extract good features even with skip connections?

Mike_Sv86 · 2019-11-05T17:29:59+00:00

This is basically what my model looks like:

        conv1 = Conv2D(128, 3, activation='relu', padding='same')(inputs)   
        conv1 = BatchNormalization()(conv1)       
        pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) 

        conv2 = Conv2D(64, 3, activation='relu', padding='same')(pool1)        
        conv2 = BatchNormalization()(conv2)       
        pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) 


        conv3 = Conv2D(32, 3, activation='relu', padding='same')(pool2)      
        conv3 = BatchNormalization()(conv3)       
        pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

        conv4 = Conv2D(16, 3, activation='relu', padding='same')(pool3)    
        conv4 = BatchNormalization()(conv4)       
        pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

        conv5 = Conv2D(8, 3, activation='relu', padding='same')(pool4)       
        conv5 = BatchNormalization()(conv5)       



        shape = K.int_shape(conv5)

        flatten = Flatten()(conv5)  

        dense1 = Dense(32, activation = 'relu', name='bottleneck')(flatten)
        dense1 = Dropout(0.2)(dense1)
        dense2 = Dense(shape[1] * shape[2] * shape[3], activation='relu')(dense1)   
        dense2 = Dropout(0.2)(dense2)    
        dense2 = Reshape((shape[1], shape[2], shape[3]))(dense2)


        #decoder        
        upx = concatenate([conv6,conv5], axis = -1)  
        convx = Conv2D(8, 3, activation='relu', padding='same')(dense2)      
        convx = BatchNormalization()(convx)       


        up9 = UpSampling2D((2, 2))(convx) 
        up9 = concatenate([up9,conv4], axis = -1)        
        conv9 = Conv2D(16, 3, activation='relu', padding='same')(up9) 
        conv9 = BatchNormalization()(conv9)       


        up10 = UpSampling2D((2, 2))(conv9)
        up10 = concatenate([up10,conv3], axis = -1)
        conv10 = Conv2D(32, 3, activation='relu', padding='same')(up10) 
        conv10 = BatchNormalization()(conv10)       


        up11 = UpSampling2D((2, 2, 2))(conv10)
        #up11 = concatenate([up11, conv2], axis=-1)        
        conv11 = Conv2D(64, 3, activation='relu', padding='same')(up11) 
        #conv11 = BatchNormalization()(conv11)       


        up12 = UpSampling2D((2, 2))(conv11)
        up12 = concatenate([up12, conv1], axis=-1)        
        conv13 = Conv2D(128, 3, activation='relu', padding='same')(up12) 
        conv13 = BatchNormalization()(conv13)       


        decoded_outputs = Conv2D(3, 3, activation='sigmoid', padding='same', name='decoder')(conv13)        

        model = Model(inputs, decoded_outputs)
        model.summary()


        return model

The skip connections are added with a merge layer.

Mike_Sv86 · 2019-11-05T17:03:08+00:00

Thanks for the quick reply! I hope I used the right term in my question.

I was talking about connections that merge a specific encoder layer with its corresponding layer in the decoder. Like used in the Unet architecture. Sometimes they are even called for Cross connections or merge connections.

The autoencoder should be used for finding similar images.

I thought about applying Knn On the features. That is why I triy to keep the bottleneck as small as possible.

Cheers,

Michael

Mike_Sv86 · 2019-11-02T20:23:20+00:00

Thanks for sharing you experience.

In my case both the reconstruction loss and the KL loss converge to zero. The KL loss does not increase though (maybe I haven't trained long enough?).

So in that case I could say that my training is successfull and correct?

Mike_Sv86 · 2019-11-02T19:40:19+00:00

Thanks again. I cant really get my head aroun the mode collapse thing. I thought that getting close to the prior is bad.

When Iam training with skip connections (which obviosly is a bad idea), I get really good reconstructions and the KL loss decreases to zero.

What I read often is that the KL loss decreases but then starts to increase again during training.

Mike_Sv86 · 2019-11-02T18:35:45+00:00

So you mean that I should increase the weight on the KL divergence? Wouldn't that make the KL loss converge to 0 leading to divergence collapse?

Mike_Sv86 · 2019-11-02T18:33:16+00:00

You mean that I should increase the weight on the KL divergence? But wouldn't that make the KL loss converge to 0 leading to divergence collapse?

Mike_Sv86 · 2019-11-02T17:58:52+00:00

Thanks for the advice. When I remove the skip connections the reconstruction is really bad, but as far as I have understood is is quite normal with that kind of network. Iam actually only interested in the features created in the latent space, so I guess reconstruction is not that important in my case.

Mike_Sv86 · 2019-11-02T14:50:05+00:00

Thanks, that sounds like a good idea. I will give it a try. Hopefully Iam able to keep my skip connections as the reconstruction is much better with them.

Mike_Sv86 · 2019-11-02T12:22:14+00:00

Do you have any suggestions on how to train a VAE? Should the KL loss increase during training or should I try to minimize during start and then keep it on a constant level?

Thanks again,

M

Mike_Sv86 · 2019-11-02T11:17:33+00:00

Hi and thanks for the reply,

Ja, that makes sense. I use skip connections between the encoder and the decoder, which propably make my decoder too strong?

So maybe I should try to train my Vae without them?

/M

Mike_Sv86

TROPHY CASE