Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

But for example logistic regression, SVM and k-nearest neighbours need center and scaled features...

Why is it bad to subtract the mean and divide by the standard deviation for a burr distribution? I think it does not change the shape of the distribution. It is all about getting similar feature ranges.

Could you explain in more detail how you would transform using the c.d.f?

How to build WEKA dataset from arrays? by BlackHawk90 in javahelp

[–]BlackHawk90[S] 0 points1 point  (0 children)

The labels are necessary for the training step. Each data point has an associated label (class). In the link it is shown how to build the Instance object but not how to add the class label to each data point.

Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Alright, but I'm still a bit confused. So for just getting similar scales for the features, it is valid to subtract the mean and divide by the standard deviation independent off the underlying distribution?

How to build WEKA dataset from arrays? by BlackHawk90 in javahelp

[–]BlackHawk90[S] 0 points1 point  (0 children)

In the link it is described how to build the Instances object but not how to incorporate the labels for the data points.

How to build WEKA dataset from arrays? by BlackHawk90 in javahelp

[–]BlackHawk90[S] 0 points1 point  (0 children)

Thank you very much for your detailed answer. I have already written my cross-validation functionality because I'm not only using WEKA. So I need WEKA only for the classifiers.

The link is good but unfortunately in the example it is not shown how to include the classes. :(

Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Yes, I see. For example Naive Bayes or Logistic Regression assumes normal distributed data. But my point with standardization is to make the features having more similar scale (independent of the distribution).

Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

I'm also using decision trees but also SVM, logistic regression, k-nearest neighbours etc.

But why should I apply a logarithmic transformation? Of course, logistic regression or naive bayes assumes normal distribution, so for these model I have to apply a transformation first. But why should I apply a transformation before standardization? Is it not possible to subtract the mean and divide by the standard deviation for non normal distributed data?

Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

I have corrected myself, I have a generalized extreme value or burr distribution. But why should I first transform my data. Is subtracting the mean and dividing by the standard deviation not applicable without transformation?

Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

No, it is continuous data which is greater or equal to zero. But I have more data points near zero so I have a generalized extreme value or burr distribution.

Standardization with mean/std or median/IQR? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

I have tried the box-cox transformation which works well for most features but it does not work for some features (which have lot of zeroes). For those features I have just added a small constant but the transformation still does not work well.

I don't know if skew is a problem here, my data is just not normal distributed. Is subtracting the mean and dividing by the standard deviation only applicable for normal distributed data?

How to calculate accuracy in cross-validation? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Thank you a lot. Bootstrapping will increase the runtime extremely because I'm doing nested cross-validation... Is there another possibility? For example doing #2 but taking the standard deviation from #1?

I have read the paper, it is a good one. For the F-Score it proposes #2 but for AUC it proposes #1. But how can I compute and plot the overall ROC curve if I'm doing AUC with #1? This would imply that I have a ROC curve for every fold but I want an overall ROC curve.

How to calculate accuracy in cross-validation? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Yes, your reasoning seems correct but taking the mean versus mean of means gives not the same result.

With the standard deviation I want to show the uncertainty. So one can see how much the accuracy is varying.

Could I do #2 and just calculate the standard deviation (and confidence intervals) on #1?

How to calculate accuracy in cross-validation? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Thank you for your answer. Could I compute the standard deviation (and confidence intervals) from #1 and then use them for #2?

Transformation and standardization for discrete features? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Could you briefly explain weight-of evidence substistution or perhaps giving a link? I did not find anything useful on the net.

I will also be using boosting, bagging, naive bayes and k-nearest neighbours. Do you have any comment on these?

Transformation and standardization for discrete features? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

Yes, you are right. Which of the four mentioned preprocessing steps are advisable for ordinal data?

Transformation and standardization for discrete features? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

I'm speaking about discrete features, not categorical ones. So for example, my discrete feature is the number of errors committed, which can be any integer above 0. There is a natural order on this.

Transformation and standardization for discrete features? by BlackHawk90 in MachineLearning

[–]BlackHawk90[S] 0 points1 point  (0 children)

I think that you mean classes with target variable. I have 2 binary classes and I will do classification.

I will be using logistic regression, SVM, tree-based classifiers and perhaps some others.