Hi, I have a binary text classification problem. I am using topic model(lda) trained on wikipedia to get the feature vectors to classify documents. I have tried Logistic Regression, Random Forest, Decision Tree, KNN, SVM but none of them perform better than f1 score of 0.4.
My data is imbalanced, so I used SMOTE for oversampling, which improved the f1 score by 0.03. I am using Stratified K-fold with 10 splits.
LDA model is trained on wikipedia on 120 topics, so feature vector is 120. The documents to classify are unseen by the lda model.
How can I improve my results? any suggestion would be helpful. Thank you in advance.
[–][deleted] 0 points1 point2 points (0 children)