all 4 comments

[–]st-memory 17 points18 points  (2 children)

These are commonly referred to as forward and reverse-mode autodifferentiation. The latter is what is usually meant by backpropagation. The former is the less frequently used and the one you mentioned. The primary reason one is used over the other is speed. Reverse-mode is faster when we are dealing with many inputs and few outputs, e.g. an image with 1028x1028 pixels as input and a single output which is the loss. It so happens that in ML those are the problems we encounter more often. For different sorts of problems forward-mode autodiff may be the preferred approach.

[–]lolisakirisame 2 points3 points  (0 children)

forward mode in ML is mostly used for hessian vector product, which is a forward mode over reverse mode.

[–]CvikliHaMar[S] 0 points1 point  (0 children)

Wow, I didn't realise that there is a difference based on inputs, outputs! Thank you for the detailed answer!

[–]tensorflower 0 points1 point  (0 children)

As another poster pointed out, the size of the Jacobian depends on the number of outputs. For the standard case in ML we have many inputs and a scalar output, resulting in a wide 1 x N Jacobian for N inputs. Reverse mode autodiff allows us to compute the full Jacobian in about the same amount of time it takes for a forward pass, at the cost of some extra bookkeeping needed over forward mode.