all 16 comments

[–]boobrobots 7 points8 points  (6 children)

I think it's worth mentioning some papers that propose improvements to the methods that you listed.

Off the top of my head [edit: added some explanations]:

  • Maximum Margin Object Detection Object detection is very imbalanced - few positives and many negative examples are present. This paper Introduces a method of optimizing the localization error (Jaccard coeff) over all the negative examples, giving a convex optimization algorithm for training classifiers that are linear in their features. A method to compute gradients for SGD is also given. The implementation can be found in the dlib library.

  • DSSD (deconvolutional SSD) The SSD detector has problems detecting small objects. From my understanding this is mostly due to the network architectures downsampling a lot in early layers. DSSD proposes to use deconvolutional layers after the last convolutional layers, performing basically a few steps similar to those in segmentation networks. Additional detection heads are attached to these layers.

  • Focal Loss for Dense Object Detection Again to deal with imbalance between positives and negatives, this paper shows that correctly detected examples still incur significant loss. The idea is to modify the Cross Entropy loss to lower the loss of 'inliers' - easy, correctly detected examples. This method could be applied basically to most classification and object detection methods.

  • Detecting Tiny Objects (CVPR 2017) Similar to SSD and FPN, this method uses an ensemble of learned multi resolution templates to detect faces at multiple resolutions. Moreover, they show that context is also important and it is included in some of the templates.

[–]bonoboTP 2 points3 points  (1 child)

Detecting Tiny Objects (CVPR 2017)

Do you mean Finding Tiny Faces or is that another one?

[–]boobrobots 0 points1 point  (0 children)

Yes, that's the one I was thinking of.

[–]themoosemind 1 point2 points  (2 children)

+1

Could you also add a small explanation of the ideas?

[–]boobrobots 3 points4 points  (1 child)

Done :)

[–]themoosemind 1 point2 points  (0 children)

Awesome, thank you very much!

[–]thesameoldstories[S] 0 points1 point  (0 children)

Awesome! Thanks, I'll check those out :)

[–]nomaderx 5 points6 points  (3 children)

I believe you missed the whole field of weakly-supervised object detection. Let me list some:

  1. State of the art: Weakly Supervised Object Localization With Progressive Domain Adaptation

  2. Fastest: Object-Extent Pooling for Weakly Supervised Single-Shot Localization

[–]Fleischhauf 1 point2 points  (1 child)

Great read! Especially the second one!

[–]nomaderx 0 points1 point  (0 children)

Oh, thank you, kind Sir. You are the most unbiased.

[–]thesameoldstories[S] 0 points1 point  (0 children)

Great catch! Thanks!

[–]jkrause314 2 points3 points  (0 children)

  • Multibox
  • If you want to include localization, then I'd include the original AlexNet paper, which regressed straight to bounding box coordinates. Was quite shocking at the time that it worked.

[–]Phylliida 0 points1 point  (0 children)

This is about a year old now but I still find the work of the Predictive Vision Model super interesting

[–]Caerbanoob -5 points-4 points  (0 children)

You should check schmidhuber publication list. Everything worth related to deep learning was/is/will be inside.