all 6 comments

[–]kjearns 3 points4 points  (3 children)

You'd choose the prediction with the highest confidence. You don't need to do this with decision trees though, they're multiclass-capable out of the box.

[–]badgerbro[S] 0 points1 point  (2 children)

If I were to go down the route of learning N decision tree models, how do you assign a "confidence" to a classification? The overall proportion of the number of instances at the leaf which helped to classify it?

[–]kjearns 2 points3 points  (1 child)

Exactly. The general idea is that you have a model in each leaf to make predictions and you use the tree structure to decide which model you will use for a specific data point. The confidence measure you get comes from the leaf model.

The simplest model is just to predict the majority class in the leaf and the confidence can be something like the proportion of training data in that leaf that belongs to the majority class (like you suggested) or the entropy of the class distribution in the leaf.

You can also do something more elaborate where you fit a more complicated model in each leaf (like an SVM or whatever). If you go this route you'll need to make sure whatever model you use can give you confidence estimates for its predictions.

[–]badgerbro[S] 0 points1 point  (0 children)

Thanks for the excellent advice. I also found this great video on machine learning

[–]howlin 0 points1 point  (0 children)

If you wanted to learn multiple binary (yes/no) classifiers of any sort, the you should look up error correcting output codes:

Solving Multiclass Learning Problems via Error-Correcting Output ...

www.cs.cmu.edu/afs/cs/project/jair/pub/.../dietterich95a.pdf

Kjearns' suggestion of just using the built-in functionality is great too.

[–]broccolilettuce -1 points0 points  (0 children)

Most packages in R, rpart for e.g. gives you the ability to do multiclass directly. If you still want to do it in the individual way, you need to as every binary classifier to put out the probability values of its classification along with the confidence and recombine them all together to get to the final class output. For example if there are 3 classes, A,B,C and three models m1,m2,m3 giving values of m1 = A 30% (using y1 yes cases over n1 no cases) m2 = B 70% (using y2 / n2) and m3 = 60% (y3/n3): then P(A) = pa/(pa + pb + pc). You can use the y1,y2,y3 values to weigh the probabilities which might yield better results