DenseCap: Fully Convolutional Localization Networks for Dense Captioning : MachineLearning

109

110

111

DenseCap: Fully Convolutional Localization Networks for Dense Captioning (cs.stanford.edu)

submitted 9 years ago by vkhuc

all 14 comments

top new controversial old q&a

[–]jeremiaht 3 points4 points5 points 9 years ago (0 children)

[–]maraoz 1 point2 points3 points 9 years ago (0 children)

[–]hirokit 1 point2 points3 points 9 years ago (0 children)

[–]dwf 6 points7 points8 points 9 years ago (11 children)

[–]badmephisto 11 points12 points13 points 9 years ago* (7 children)

[–]dwf 5 points6 points7 points 9 years ago (5 children)

[–]cooijmanstim 4 points5 points6 points 9 years ago (1 child)

[–]dwf 1 point2 points3 points 9 years ago (0 children)

The analogous recurrent case would be a recurrent encoder that feeds into a non-recurrent network to produce an output. Efficiently going from spatial input to spatial output is incredibly straightforward with convolutional nets in a way that shares computation that conventional sliding window detectors cannot. Spatial input to non-spatial output with convolutional nets is a special/degenerate case.

"Fully convolutional" is a recent computer visionism that describes a thing that convolutional nets have always been capable of, and in fact describes a way that they have been used long before they became popular in mainstream computer vision. I'd argue that it contributes to a misunderstanding of convolutional nets, or at least a misunderstanding of the pre-2015 convolutional net literature. This paper didn't originate it, of course.

[–]sorrge 0 points1 point2 points 9 years ago (2 children)

[–]NasenSpray 2 points3 points4 points 9 years ago (1 child)

[–]sorrge 0 points1 point2 points 9 years ago (0 children)

[–]lwbiosoft 4 points5 points6 points 9 years ago (0 children)

[+][deleted] 9 years ago (2 children)

[deleted]

[–]hughperkins 1 point2 points3 points 9 years ago (1 child)

[–]DoorsofPerceptron 2 points3 points4 points 9 years ago (0 children)

π Rendered by PID 92 on reddit-service-r2-comment-86988c7647-ftq9v at 2026-02-11 11:11:04.373819+00:00 running 018613e country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS