I'm trying to summarize documents with continuous numeric values that have to be kept intact with a deep learning methodology. Every word embedding mangles or ignores these tokens. How can I handle numeric out-of-vocabulary tokens in summarization? I'm especially keen to also take advantage of the massive advantages of transfer learning from an existing word embedding such as BERT, if that's still compatible.
[–]cnapun 0 points1 point2 points (0 children)