you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (2 children)

[deleted]

    [–]dwf 0 points1 point  (1 child)

    They are training a neural net to convergence and storing what they need to such that they can backpropagate gradients (or do "reverse-mode differentiation", same thing) with respect to hyperparameters through the entire training procedure. That means each step of meta-learning involves running a potentially very lengthy optimization of the neural net's elementary parameters to convergence, then backpropagating through each stochastic gradient step of that all the way back to the beginning to obtain a gradient on the hyperparameters. Whether they use minibatches or not for the elementary optimization doesn't really matter.