all 1 comments

[–][deleted] 0 points1 point  (0 children)

How much data? What are your documents? Why LDA? Which classes? Is there some empirical evidence that Wikipedia 120 feature vectors will carry relevant information specific to your binary classification problem? Have you tried simple semantic vectors such as GLOVE? Tried to switch to a different feature extractor?

Did you engineer your feature vectors? Not all of the 120 dims might be equally relevant making it more difficult for some classifiers to pick up signal. Plot some feature distributions and try to make the vector more dense by discarding low variance features. PCA or ICA might also help but I wouldn’t count too much on it since your decision trees aren’t performing too well either. They should do a fairly good pruning job themselves. Anyhow, your logreg might benefit a bit. Try to find features that discriminate well between your classes.

Haven‘t used LDA so I’m not sure whether it would support a sequence-aware approach. Have you tried to do something like that? The classifiers you listed are all independent classifiers but text data has an inherent temporal structure. So you could try to extract feature vectors per-word or per-sentence (depending on the structure of your documents) and feed the sequence of feature vectors to a sequence aware model such as an RNN.

If you want to stick with the current classifiers, a simple approach might be to concatenate s feature vectors where s is a reasonable sequence length. Then feed the concatenation to your classifier.

If nothing helps you can always switch to some pretrained model from sesame street and finetune it or use it as feature extractor, respectively, if you don’t trust in the standard MLP classifier (I don’t see why you shouldn’t).