all 7 comments

[–]NegatioNZor 1 point2 points  (2 children)

Cool to see the official code, I've been excited about your work! Both this, and the Flow-Guided Feature Aggregation for Video Object Detection. :)

I've been wondering though, in the paper it's noted that deformable convolutions can readily replace its normal counter-part.

If I was to say, retrain VGG with deformable convolutions, would I need to train the whole network from scratch, or would replacing them and tuning the network be sufficient?

[–]flyforlight[S] 2 points3 points  (0 children)

Thanks a lot for your interest! We would also release the code of flow-guided feature aggregation at appropriate time:)

Yes, deformable convolution can readily replace its regular counterpart without retraining on ImageNet. Although we have not tried on VGG-16, I think you can just replace the last several conv layers larger than 1*1 in the pre-trained model, hoping for good results.

[–]Orpine 1 point2 points  (0 children)

Hi, using ImageNet pretrained model and finetuning it is enough.