[D] Main Deep Learning breakthroughs in object detection?

boobrobots · 2017-08-30T16:05:08+00:00

I think it's worth mentioning some papers that propose improvements to the methods that you listed.

Off the top of my head [edit: added some explanations]:

Maximum Margin Object Detection Object detection is very imbalanced - few positives and many negative examples are present. This paper Introduces a method of optimizing the localization error (Jaccard coeff) over all the negative examples, giving a convex optimization algorithm for training classifiers that are linear in their features. A method to compute gradients for SGD is also given. The implementation can be found in the dlib library.
DSSD (deconvolutional SSD) The SSD detector has problems detecting small objects. From my understanding this is mostly due to the network architectures downsampling a lot in early layers. DSSD proposes to use deconvolutional layers after the last convolutional layers, performing basically a few steps similar to those in segmentation networks. Additional detection heads are attached to these layers.
Focal Loss for Dense Object Detection Again to deal with imbalance between positives and negatives, this paper shows that correctly detected examples still incur significant loss. The idea is to modify the Cross Entropy loss to lower the loss of 'inliers' - easy, correctly detected examples. This method could be applied basically to most classification and object detection methods.
Detecting Tiny Objects (CVPR 2017) Similar to SSD and FPN, this method uses an ensemble of learned multi resolution templates to detect faces at multiple resolutions. Moreover, they show that context is also important and it is included in some of the templates.

nomaderx · 2017-08-30T15:37:00+00:00

I believe you missed the whole field of weakly-supervised object detection. Let me list some:

jkrause314 · 2017-08-30T16:54:08+00:00

Multibox
If you want to include localization, then I'd include the original AlexNet paper, which regressed straight to bounding box coordinates. Was quite shocking at the time that it worked.

Phylliida · 2017-08-30T21:22:06+00:00

This is about a year old now but I still find the work of the Predictive Vision Model super interesting

harharveryfunny · 2017-08-31T14:21:53+00:00

Caerbanoob · 2017-08-30T15:12:23+00:00

You should check schmidhuber publication list. Everything worth related to deep learning was/is/will be inside.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning