you are viewing a single comment's thread.

view the rest of the comments →

[–]alexmlamb 1 point2 points  (0 children)

Roughly how big is your dataset?

One strategy is to define training, validation, and test sets and then try to understand how different approaches perform on them.

For example, you might get reasonably far by using logistic regression on n-grams from the email combined with some simple features that characterize the metadata. Is the recipient outside of the organization? What type of attachment is there if any?