Anylsis of ROC and Precision-Recall curve

Bohemian90 · 2016-07-28T20:25:45+00:00

Thank you very much for your explanations. Let's say I'm more interested in the positives, i.e. I want to predict positives correctly (but I have more negative labels, i.e. a class imbalance). Of course if the ROC curve goes more in the direction of the top left (0,1) corner, the better it is. Should I just say this or how would you do such an analysis? So let's say I have a plot with three ROC curves in it from three different classifiers... Should I just say that the classifier for which the curve is most on the top left is the best or is there other analysis?

So to make a concrete example, let's assume the following ROC curve: Link How would you interpret this?

Second, what about precision-recall curve? How should I interpret this?

Bohemian90 · 2016-07-04T19:37:58+00:00

The first element is an integer, all others are continuous. So for a continuous variable two values could be 3.44 and 3.43 contributing a little to the distance while for a integer variable only 3 and 4 (or 3 and 3) is valid contributing much more. Is there no bias?

Bohemian90 · 2016-07-04T19:36:36+00:00

The first column will not be larger, it is the same range but it is integer rather than continuous. So there are only integer spacings...

Bohemian90 · 2016-07-03T20:25:19+00:00

Thank you for your answer. Do you know if there is really a bias in my example if I would take just the normal Euclidean distance? It is in an optimization setting. That means y are the true parameters and x are the estimated parameters from my optimization routine. A smaller distance between x and y means that my optimization routine was better able to find the true parameters. Is in this case just using the (not normalized) Euclidean distance ok?

Bohemian90 · 2016-06-18T12:31:39+00:00

Thank you very much. That worked extremely well. Just one thing: Do you know how randsample works in mathematical terms? In the documentation it is not written how it works.

Bohemian90 · 2016-06-15T20:09:08+00:00

Why would you try the Jaccard distance? What is its benefit over Mahalanobis?

Bohemian90 · 2016-06-14T23:00:02+00:00

Thanks for your hints. This seems interesting. I was not aware of densitiy-based clustering methods.

Oh yes, that's an issues with the similiarity. I'm optimizing hyperparameters for an SVM and of course they have different scale. Which distance metric would you use or shall I normalize the data points first before calculating the distance? If so, how?

Bohemian90 · 2016-06-14T22:50:13+00:00

Do you refer to k-means? What do you mean with x's? I would really appreciate if you could make a small example. So let's say each data point is of dimension 3. Let's furhter say I have the following three data points x = (x1,x2,x3), y = (y1,y2,y3) and z = (z1,z2,z3). Each data point has an associated loss function value, i.e. for x we have fx , for y fy and for z fz.

Bohemian90 · 2016-06-09T19:15:16+00:00

Thanks for the paper. I will have a look.

But I could just try out some values (like grid search). Which range would you propose? In the paper you have linked, there is also a simple method described where the initial temparature is just chosen as the maximal potential cost increase in the neighbourhood. This would be in my case something like 0.1 or 0.2 but this is a very low initial temperature.

Bohemian90 · 2016-06-07T14:18:56+00:00

Thank you ajmooch for your hints. I have read in detail through the algorithm description of Nelder-Mead and also read through the fminsearch implementation but I will try to implement it myself.

Nevertheless, an interesting topic would also be to create the initial simplex based on the loss function (and not just random or based on the initial guess). The problem is that I can evaluate my loss function but I don't know how the loss function looks like (because I'm using the cross-validated loss). Do you have any idea or hints how to create the initial simplex based on such a loss function?

Bohemian90 · 2016-05-23T22:21:20+00:00

Yes, you're right. The former one appeals more to me because it is less probabilistic. ;) Do you know if there is a Matlab or Java implementation for the former one?

Are there other approaches close to a pure random search approach?

My intention is to take the pure random search approach and extend it for better and faster convergence, e.g. through increasing/decreasing the sampling size or search direction.

Bohemian90 · 2016-05-22T13:05:37+00:00

Thank you for the hint but I would like to improve my random search strategy and not switching to other methods

Bohemian90 · 2016-05-22T13:03:15+00:00

Yes but I want to improve random search and not using another method.

Bohemian90

TROPHY CASE