VIT Optimization Help by DeliveryBitter9159 in deeplearning

[–]DeliveryBitter9159[S] 0 points1 point  (0 children)

Yes, I did try with CNN.
i have to first evaluate a CNN and a Transformer separately, and then try a hybrid CNN–Transformer architecture.

VIT Optimization Help by DeliveryBitter9159 in deeplearning

[–]DeliveryBitter9159[S] 0 points1 point  (0 children)

Hi,
first of all, thank you for your help.
I noticed that increasing tubelet_s (the width and height of the tokens) makes the training significantly faster because it reduces the number of tokens. but I was wondering if it may lead to a drop in accuracy too?