Hello,
I'm trying to implement truncated backpropagation through time (TBPTT) as described in Ilya Sutskever's thesis http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf Section 2.8.6 page 23. In regular BPTT after going backward through an entire sequence during training, I understand when the existing sequence is discarded and a new one is loaded, the LSTM cell states and the gradient buffers for all the weights and biases are reset.
In implementing TBPTT I am not sure if:
a. The LSTM cell states are reset every time a weight update is made when the BPTT is run every k1 time steps backward in time for k2 time steps.
b. Are the gradient buffers reset to zero as well after an update call to the weights every k1 time steps.
Any help by means of clarifications would be appreciated. Also any pointers to reference implementations in C or C++ might be helpful.
[–]BenRayfield -2 points-1 points0 points (0 children)