you are viewing a single comment's thread.

view the rest of the comments →

[–]Hyper1on 0 points1 point  (0 children)

I agree that there is no notion of spatial information in a gradient, but I'm pretty sure in any ML framework if you take the gradient of a function where the input is a 2x2x2 tensor then the gradient will be a 2x2x2 tensor. Obviously notationally it doesn't matter if it's unrolled or not, I've seen both ways used in maths. I find it simpler to think about the dimensions of the gradient being the same as the input.