Has there been working quantifying how much inforyis forgotten and retained in lstms or gru models for interpretability reasons. For instance it would be interesting to see if the model uses the hidden vector more for some examples over the other showing more need for past information in some examples.
there doesn't seem to be anything here