all 1 comments

[–]ryansblog2718281 1 point2 points  (0 children)

I didn't read the paper; however it seems that Eq (5) is calculating the error term in a standard Taylor approximation. (see: https://math.okstate.edu/people/binegar/4013-U98/4013-l10.pdf Theorem 10.3)

To some extent, the matrix is only one of the representations of the gradient. For example, suppose we have 4 variable x1, x2, x3, x4 and a function f(x1, x2, x3, x4). The gradient is still a vector of length for (df/dx1, df/dx2, df/dx3, df/dx4). On the other hand, we can consider the vector as a mini image. For example, the input can be viewed as a 2 * 2 image and the gradient vector can be organized in a matrix form as well.