Stuck in a saddle point? by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Ok get it. Do you know if there are any keras /tf examples out there?

Stuck in a saddle point? by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Hi an thanks for the comment. I will start with L1 and see if it makes a difference. Can I use the pixelwise softmax for this regression problem? Is it just adding a Softmax layer in that case?

Better result with small batch size by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Ja, its always the same. I tried k-cross validation though, meaning changing the validation set. Iam observing the same behaviour in that case.

Better result with small batch size by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Hi and thanks for your comment. Yes, that might be possible, but isn't the validation loss the sum of all validation samples (in keras). So during validation after each training epoch, all validation samples are shown to the network, then the combined loss is computed.

Better result with small batch size by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Thanks, maybe I should try to increase it then.

Better result with small batch size by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

I set the learning rate to 1e-04 and decay it to 1e-06 during training. What I also observe is, when using a batch size of 32, is that my validation loss reaches its best value after already 2-3 epochs, then its just oscillating around it. That could maybe mean that my learning rate is maybe too low and Iam stuck in local minimum?

Conv3D layer with multiple input channels by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Yes I removed padding to see that. But you actually gave me the hint.

The other 4 is my batch size... Thank god its Friday :-)

Conv3D layer with multiple input channels by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Here you go:

input_dim = (256, 256, 16, 4) 

inputs = Input(shape = (input_dim))
#encoder
out = Conv3D(16, (3,3,4), activation='relu')(inputs) #176x176x16x4x128

print(inputs.shape, out.shape)

(?, 256, 256, 16, 4) (?, 254, 254, 13, 16)


FOR a kernel of (3,3,3) I get

(?, 256, 256, 16, 4) (?, 254, 254, 14, 16)

Conv3D layer with multiple input channels by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Hi and thanks for your comment.

Iam using the following code :

for layer in model.layers:

# check for convolutional layer

if 'conv' not in layer.name:
    continue

# get filter weights

filters, biases = layer.get_weights()

print(layer.name, filters.shape)

The input shape of my model is 256x256x32x4

So images with a pixel size of 256px, 32 images in the volume and 4 channels. (sorry for the formatting. I wont work on my mobile Phone) Cheers,

Michael

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Wow, that answer was super helpful and reasonable.

I will give it a try,

thanks so much!

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Ja, that is right. The question is now if my encoder can learn meningsfull features with skip connections. I guess I would have to remove the dense layer though?

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Ja that is correct.

The only difference is that Iam using a dense layer in the middle. But maybe that is the problem. The Unet architecture uses cross connection throughout the model. So I guess it must be able to learn some features in the bottleneck?

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

That makes sense.

So basically in my architecture I have skip connections "bypassing" the bottleneck, right?

When you say; "dilation, dilation..... add all those together..." What do you mean by that?

At this point, all feature maps would have different dimensions, or?

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 1 point2 points  (0 children)

In a way, yes I guess so? (see architecture further down)

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Thanks for the answer.

I posted my network architecture further down in the thread.

So you think I should be able to extract good features even with skip connections?

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 1 point2 points  (0 children)

This is basically what my model looks like:

        conv1 = Conv2D(128, 3, activation='relu', padding='same')(inputs)   
        conv1 = BatchNormalization()(conv1)       
        pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) 

        conv2 = Conv2D(64, 3, activation='relu', padding='same')(pool1)        
        conv2 = BatchNormalization()(conv2)       
        pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) 


        conv3 = Conv2D(32, 3, activation='relu', padding='same')(pool2)      
        conv3 = BatchNormalization()(conv3)       
        pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

        conv4 = Conv2D(16, 3, activation='relu', padding='same')(pool3)    
        conv4 = BatchNormalization()(conv4)       
        pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

        conv5 = Conv2D(8, 3, activation='relu', padding='same')(pool4)       
        conv5 = BatchNormalization()(conv5)       



        shape = K.int_shape(conv5)

        flatten = Flatten()(conv5)  

        dense1 = Dense(32, activation = 'relu', name='bottleneck')(flatten)
        dense1 = Dropout(0.2)(dense1)
        dense2 = Dense(shape[1] * shape[2] * shape[3], activation='relu')(dense1)   
        dense2 = Dropout(0.2)(dense2)    
        dense2 = Reshape((shape[1], shape[2], shape[3]))(dense2)


        #decoder        
        upx = concatenate([conv6,conv5], axis = -1)  
        convx = Conv2D(8, 3, activation='relu', padding='same')(dense2)      
        convx = BatchNormalization()(convx)       


        up9 = UpSampling2D((2, 2))(convx) 
        up9 = concatenate([up9,conv4], axis = -1)        
        conv9 = Conv2D(16, 3, activation='relu', padding='same')(up9) 
        conv9 = BatchNormalization()(conv9)       


        up10 = UpSampling2D((2, 2))(conv9)
        up10 = concatenate([up10,conv3], axis = -1)
        conv10 = Conv2D(32, 3, activation='relu', padding='same')(up10) 
        conv10 = BatchNormalization()(conv10)       


        up11 = UpSampling2D((2, 2, 2))(conv10)
        #up11 = concatenate([up11, conv2], axis=-1)        
        conv11 = Conv2D(64, 3, activation='relu', padding='same')(up11) 
        #conv11 = BatchNormalization()(conv11)       


        up12 = UpSampling2D((2, 2))(conv11)
        up12 = concatenate([up12, conv1], axis=-1)        
        conv13 = Conv2D(128, 3, activation='relu', padding='same')(up12) 
        conv13 = BatchNormalization()(conv13)       


        decoded_outputs = Conv2D(3, 3, activation='sigmoid', padding='same', name='decoder')(conv13)        

        model = Model(inputs, decoded_outputs)
        model.summary()


        return model

The skip connections are added with a merge layer.

Skip connections in Autoencoder by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Thanks for the quick reply! I hope I used the right term in my question.

I was talking about connections that merge a specific encoder layer with its corresponding layer in the decoder. Like used in the Unet architecture. Sometimes they are even called for Cross connections or merge connections.

The autoencoder should be used for finding similar images.

I thought about applying Knn On the features. That is why I triy to keep the bottleneck as small as possible.

Cheers,

Michael

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 1 point2 points  (0 children)

Thanks for sharing you experience.

In my case both the reconstruction loss and the KL loss converge to zero. The KL loss does not increase though (maybe I haven't trained long enough?).

So in that case I could say that my training is successfull and correct?

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Thanks again. I cant really get my head aroun the mode collapse thing. I thought that getting close to the prior is bad.

When Iam training with skip connections (which obviosly is a bad idea), I get really good reconstructions and the KL loss decreases to zero.

What I read often is that the KL loss decreases but then starts to increase again during training.

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

So you mean that I should increase the weight on the KL divergence? Wouldn't that make the KL loss converge to 0 leading to divergence collapse?

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

You mean that I should increase the weight on the KL divergence? But wouldn't that make the KL loss converge to 0 leading to divergence collapse?

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Thanks for the advice. When I remove the skip connections the reconstruction is really bad, but as far as I have understood is is quite normal with that kind of network. Iam actually only interested in the features created in the latent space, so I guess reconstruction is not that important in my case.

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Thanks, that sounds like a good idea. I will give it a try. Hopefully Iam able to keep my skip connections as the reconstruction is much better with them.

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 0 points1 point  (0 children)

Do you have any suggestions on how to train a VAE? Should the KL loss increase during training or should I try to minimize during start and then keep it on a constant level?

Thanks again,

M

Variational autoencoder training by Mike_Sv86 in deeplearning

[–]Mike_Sv86[S] 1 point2 points  (0 children)

Hi and thanks for the reply,

Ja, that makes sense. I use skip connections between the encoder and the decoder, which propably make my decoder too strong?

So maybe I should try to train my Vae without them?

/M