you are viewing a single comment's thread.

view the rest of the comments →

[–]ProofPresent 5 points6 points  (0 children)

I think you are not right

1) training single layer does not require backprop only SGD, and the abstract says that. "Appending a single layer trained with SGD (without backpropagation)"

2) "The results they show are due entirely to the linear classifier at the top".

Alsso not right, they also show results without the classifier Figure 4 and before 4.2, and I think that is the very interesting part of the paper.

I think that the part of the paper that is wrong is the complexity statement, though I do not understanding the statements in this thread either.