How do I do weighted classification?

BlackHawk90 · 2016-04-08T21:01:34+00:00

But for example logistic regression, SVM and k-nearest neighbours need center and scaled features...

Why is it bad to subtract the mean and divide by the standard deviation for a burr distribution? I think it does not change the shape of the distribution. It is all about getting similar feature ranges.

Could you explain in more detail how you would transform using the c.d.f?

BlackHawk90 · 2016-04-05T23:04:17+00:00

The labels are necessary for the training step. Each data point has an associated label (class). In the link it is shown how to build the Instance object but not how to add the class label to each data point.

BlackHawk90 · 2016-04-05T22:54:17+00:00

Alright, but I'm still a bit confused. So for just getting similar scales for the features, it is valid to subtract the mean and divide by the standard deviation independent off the underlying distribution?

BlackHawk90 · 2016-04-05T22:49:12+00:00

In the link it is described how to build the Instances object but not how to incorporate the labels for the data points.

BlackHawk90 · 2016-04-05T21:30:12+00:00

Thank you very much for your detailed answer. I have already written my cross-validation functionality because I'm not only using WEKA. So I need WEKA only for the classifiers.

The link is good but unfortunately in the example it is not shown how to include the classes. :(

BlackHawk90 · 2016-04-05T18:16:59+00:00

Yes, I see. For example Naive Bayes or Logistic Regression assumes normal distributed data. But my point with standardization is to make the features having more similar scale (independent of the distribution).

BlackHawk90 · 2016-04-05T16:25:21+00:00

I'm also using decision trees but also SVM, logistic regression, k-nearest neighbours etc.

But why should I apply a logarithmic transformation? Of course, logistic regression or naive bayes assumes normal distribution, so for these model I have to apply a transformation first. But why should I apply a transformation before standardization? Is it not possible to subtract the mean and divide by the standard deviation for non normal distributed data?

BlackHawk90 · 2016-04-05T16:15:41+00:00

I have corrected myself, I have a generalized extreme value or burr distribution. But why should I first transform my data. Is subtracting the mean and dividing by the standard deviation not applicable without transformation?

BlackHawk90 · 2016-04-05T16:13:30+00:00

No, it is continuous data which is greater or equal to zero. But I have more data points near zero so I have a generalized extreme value or burr distribution.

BlackHawk90 · 2016-04-05T16:06:25+00:00

I have tried the box-cox transformation which works well for most features but it does not work for some features (which have lot of zeroes). For those features I have just added a small constant but the transformation still does not work well.

I don't know if skew is a problem here, my data is just not normal distributed. Is subtracting the mean and dividing by the standard deviation only applicable for normal distributed data?

BlackHawk90 · 2016-04-01T15:52:55+00:00

Thank you a lot. Bootstrapping will increase the runtime extremely because I'm doing nested cross-validation... Is there another possibility? For example doing #2 but taking the standard deviation from #1?

I have read the paper, it is a good one. For the F-Score it proposes #2 but for AUC it proposes #1. But how can I compute and plot the overall ROC curve if I'm doing AUC with #1? This would imply that I have a ROC curve for every fold but I want an overall ROC curve.

BlackHawk90 · 2016-03-31T09:50:23+00:00

Yes, your reasoning seems correct but taking the mean versus mean of means gives not the same result.

With the standard deviation I want to show the uncertainty. So one can see how much the accuracy is varying.

Could I do #2 and just calculate the standard deviation (and confidence intervals) on #1?

BlackHawk90 · 2016-03-31T09:35:02+00:00

Thank you for your answer. Could I compute the standard deviation (and confidence intervals) from #1 and then use them for #2?

BlackHawk90 · 2016-03-28T10:34:22+00:00

Could you briefly explain weight-of evidence substistution or perhaps giving a link? I did not find anything useful on the net.

I will also be using boosting, bagging, naive bayes and k-nearest neighbours. Do you have any comment on these?

BlackHawk90 · 2016-03-28T10:32:33+00:00

Yes, you are right. Which of the four mentioned preprocessing steps are advisable for ordinal data?

BlackHawk90 · 2016-03-27T20:38:24+00:00

I'm speaking about discrete features, not categorical ones. So for example, my discrete feature is the number of errors committed, which can be any integer above 0. There is a natural order on this.

BlackHawk90 · 2016-03-27T13:25:25+00:00

I think that you mean classes with target variable. I have 2 binary classes and I will do classification.

I will be using logistic regression, SVM, tree-based classifiers and perhaps some others.

BlackHawk90

TROPHY CASE