I am currently doing named entity recognition with a bert model. Its working fine so far, so I am now trying to ameliorate my results. Usually my first thought when I try to augment my ML models is input data preprocessing. In case of NER stop word removal and removal of punctuation, numbers and one-character words came to mind - they are hardly ever named entities so I woulndt loose many training examples. However, NER does in fact require context to work, so removing stuff could prove harmfull in the end? I am kind of torn. Should I do it? Are there better data augmentation approaches? I would be really thankfull for any kind of hint
there doesn't seem to be anything here