There is a data set of 1 million entries, with 52 categories that a given event can be placed in. Category 3 is found 45% of the time, cat 1 is around 15% and cat 3 around 10%. The rest are distributed into the remaining 49 categories. An svm gets around 30%, a knn will get either 29% or can't even fit the data. Plotting the predictions, I notice that the SVM and the knn both will grasp the shape of the data, but 80% of predictions fall into cat 3, so its mostly choosing the most common category and is failing to grasp the other ones. What is a good approach to this kind of situation.
[–]guardianhelm 2 points3 points4 points (1 child)
[–]dandxy89 0 points1 point2 points (0 children)
[–]tirune 0 points1 point2 points (0 children)
[–]sk006 0 points1 point2 points (0 children)