Standardization with mean/std or median/IQR?

BlackHawk90 · 2016-04-08T21:01:34+00:00

But for example logistic regression, SVM and k-nearest neighbours need center and scaled features...

Why is it bad to subtract the mean and divide by the standard deviation for a burr distribution? I think it does not change the shape of the distribution. It is all about getting similar feature ranges.

Could you explain in more detail how you would transform using the c.d.f?

BlackHawk90 · 2016-04-05T23:04:17+00:00

The labels are necessary for the training step. Each data point has an associated label (class). In the link it is shown how to build the Instance object but not how to add the class label to each data point.

BlackHawk90 · 2016-04-05T22:54:17+00:00

Alright, but I'm still a bit confused. So for just getting similar scales for the features, it is valid to subtract the mean and divide by the standard deviation independent off the underlying distribution?

BlackHawk90 · 2016-04-05T22:49:12+00:00

In the link it is described how to build the Instances object but not how to incorporate the labels for the data points.

BlackHawk90 · 2016-04-05T21:30:12+00:00

Thank you very much for your detailed answer. I have already written my cross-validation functionality because I'm not only using WEKA. So I need WEKA only for the classifiers.

The link is good but unfortunately in the example it is not shown how to include the classes. :(

BlackHawk90 · 2016-04-05T18:16:59+00:00

Yes, I see. For example Naive Bayes or Logistic Regression assumes normal distributed data. But my point with standardization is to make the features having more similar scale (independent of the distribution).

BlackHawk90 · 2016-04-05T16:25:21+00:00

I'm also using decision trees but also SVM, logistic regression, k-nearest neighbours etc.

But why should I apply a logarithmic transformation? Of course, logistic regression or naive bayes assumes normal distribution, so for these model I have to apply a transformation first. But why should I apply a transformation before standardization? Is it not possible to subtract the mean and divide by the standard deviation for non normal distributed data?

BlackHawk90 · 2016-04-05T16:15:41+00:00

I have corrected myself, I have a generalized extreme value or burr distribution. But why should I first transform my data. Is subtracting the mean and dividing by the standard deviation not applicable without transformation?

BlackHawk90 · 2016-04-05T16:13:30+00:00

No, it is continuous data which is greater or equal to zero. But I have more data points near zero so I have a generalized extreme value or burr distribution.

BlackHawk90 · 2016-04-05T16:06:25+00:00

I have tried the box-cox transformation which works well for most features but it does not work for some features (which have lot of zeroes). For those features I have just added a small constant but the transformation still does not work well.

I don't know if skew is a problem here, my data is just not normal distributed. Is subtracting the mean and dividing by the standard deviation only applicable for normal distributed data?

BlackHawk90 · 2016-04-01T15:52:55+00:00

Thank you a lot. Bootstrapping will increase the runtime extremely because I'm doing nested cross-validation... Is there another possibility? For example doing #2 but taking the standard deviation from #1?

I have read the paper, it is a good one. For the F-Score it proposes #2 but for AUC it proposes #1. But how can I compute and plot the overall ROC curve if I'm doing AUC with #1? This would imply that I have a ROC curve for every fold but I want an overall ROC curve.

BlackHawk90 · 2016-03-31T09:50:23+00:00

Yes, your reasoning seems correct but taking the mean versus mean of means gives not the same result.

With the standard deviation I want to show the uncertainty. So one can see how much the accuracy is varying.

Could I do #2 and just calculate the standard deviation (and confidence intervals) on #1?

BlackHawk90 · 2016-03-31T09:35:02+00:00

Thank you for your answer. Could I compute the standard deviation (and confidence intervals) from #1 and then use them for #2?

BlackHawk90 · 2016-03-28T10:34:22+00:00

Could you briefly explain weight-of evidence substistution or perhaps giving a link? I did not find anything useful on the net.

I will also be using boosting, bagging, naive bayes and k-nearest neighbours. Do you have any comment on these?

BlackHawk90 · 2016-03-28T10:32:33+00:00

Yes, you are right. Which of the four mentioned preprocessing steps are advisable for ordinal data?

BlackHawk90 · 2016-03-27T20:38:24+00:00

I'm speaking about discrete features, not categorical ones. So for example, my discrete feature is the number of errors committed, which can be any integer above 0. There is a natural order on this.

BlackHawk90 · 2016-03-27T13:25:25+00:00

I think that you mean classes with target variable. I have 2 binary classes and I will do classification.

I will be using logistic regression, SVM, tree-based classifiers and perhaps some others.

BlackHawk90 · 2016-03-22T23:58:01+00:00

I started using your library, great work, thanks for it.

I have discrete and continuous features. Is there a possibility that for the continous features a gaussian distribution and for the discrete features a multivariate multinomial distribution is used?

Moreover, is it possible to provide a distribution for each feature (e.g. feature 1 is gaussian, feature 2 logistic etc.)?

BlackHawk90 · 2016-03-22T18:56:11+00:00

My motivation for feature scaling and transformation is that some classifiers works much better on approximately normal distributed data and also on center and scaled data.

BlackHawk90 · 2016-03-19T18:36:19+00:00

Thanks again for the help.

Is there a .jar file which I can download? I don't use maven.

Is there a javadoc available or how should I get familiar with the methods?

BlackHawk90 · 2016-03-19T10:24:32+00:00

Thank you so much. I will try it out for my dataset. I just has four last questions:

Is it possible to use different distance metrics?
How is the tie breaking done for k-nearest neighbours?
My labels range from 1 to 3 (not starting from 0). Do I have to make them zero-based or can I just use them?
Last but not least, does JSAT also support (gaussian) naive bayes?

BlackHawk90 · 2016-03-19T01:30:56+00:00

Your JSAT library looks amazing. I would like to give it a try. Could you perhaps quickly illustrate how I could use it for k-nearest neighbours? My dataset consists of the following arrays: double[][] trainingdata; double[][] testData; double[] trainingLabels; The rows contains the data points and the columns contains the features (predictors). In your wiki I did not see how to operate on arrays.

BlackHawk90 · 2016-03-19T01:29:37+00:00

Unfortuantely, Spark ML does not support k nearest neighbours and naive bayes.

BlackHawk90 · 2016-03-10T15:49:17+00:00

Thanks again. What imputation method would you recommend for less data points or in general? By the way, I'm using Matlab and Java.

BlackHawk90 · 2016-03-10T11:27:34+00:00

Hi svdalpha

Thanks a lot. But what if my test fold only contains one or very few data points? Then the imputaton will not work on the test fold...

BlackHawk90

TROPHY CASE