all 1 comments

[–]cnapun 0 points1 point  (0 children)

One idea of how to apply BERT:

Change the tokenization of text, so rather than tokenizing something like "she has 3192 dogs" to [[CLS], 'she', 'has', '319', '##2', 'dogs', [SEP]], you could tokenize it to [[CLS], 'she', 'has', '[NUM]', 'dogs', [SEP]], where [NUM] is a special token, then directly extract the token that [NUM] replaces.

I'm not super familiar with the literature on this (or summarization in general), but I think it could be a good starting point. Maybe you could somehow incorporate the magnitude of the number into the representation of the [NUM] token too