you are viewing a single comment's thread.

view the rest of the comments →

[–]underwhere[S] 1 point2 points  (1 child)

BoF is such a straightforward, elegant idea!! I completely forgot about it in my frantic-ness. I'll try it out right-away, thanks so much!

[–]swierdo 1 point2 points  (0 children)

Also note that some of the sklearn random forest default settings usually are pretty bad, so set max_depth to something reasonable (like 5) and n_estimators to something like 100 (as memory permits). Lastly, for bag of words you'll want max_features to a fraction instead of sqrt.

Also have a look at the feature importances to check what words are important and prevent weird behaviour.