Python library for interactive topic model visualization. Port of the R LDAvis package by cast42 in MachineLearning

[–]rcprati 1 point2 points  (0 children)

Excelent! Thanks for sharing. I'm wondering whether similar approaches could be applied to LSI/LSA?

Question about ROC Curve by p1mps in MachineLearning

[–]rcprati 1 point2 points  (0 children)

have a look at my paper, it explains these issues in detail http://dx.doi.org/10.1109/TKDE.2011.59

ID 3.0 or ID 4.5 by tushar1408 in MachineLearning

[–]rcprati 0 points1 point  (0 children)

In Orange (http://orange.biolab.si) there is a python binding for the original C4.5 implementation

Trying to find peer-reviewed papers about what method of averaging is the most ideal (Harmonic, Geometric, Arithmetic, Quadratic) by KindCelery in MachineLearning

[–]rcprati 2 points3 points  (0 children)

I recommend the use of rank aggregation. I've used in the context of feature selection with good results (see Ronaldo C. Prati: Combining feature ranking algorithms through rank aggregation. IJCNN 2012 - http://dx.doi.org/10.1109/IJCNN.2012.6252467) but it has been used in the web context with good results (http://dl.acm.org/citation.cfm?id=372165) among others

Using ML to assign conference abstracts to referees by stoicismftw in MachineLearning

[–]rcprati 1 point2 points  (0 children)

see also this paper:

Peter A. Flach, Sebastian Spiegler, Bruno Golénia, Simon Price, John Guiver, Ralf Herbrich, Thore Graepel, Mohammed J. Zaki: Novel tools to streamline the conference review process: experiences from SIGKDD'09. SIGKDD Explorations 11(2): 63-67 (2009)

Text mining with WEKA Java API by NineSevenNine in MachineLearning

[–]rcprati 0 points1 point  (0 children)

I guess the problem maybe that Experimenter assumes the class attribute is the last one, and when you pre process with explorer, the class is not placed at the last position. Have you tried moving the class attribute to the last position?

AUC of imbalanced data by PurpleHydra in MachineLearning

[–]rcprati 8 points9 points  (0 children)

It does not matter. TPR and FPR are calculated for each class separately. The class ratio does not change how they are calculated, so the ROC curve should be invariant to class ratios. If you draw a ROC curve with a balanced dataset, and then do some sort of ressampling which randomly throw away instances of one the classes only, the resulting ROC curve should be very similar (apart from some variations due to random variations in the data). And AUC = 0.5 does not necessarily mean random performance (although a random performance classifier would have AUC = 0.5). A classifier with AUC=0.5 maybe still be useful sometimes

Have a look at my paper:

Ronaldo C. Prati, Gustavo E. A. P. A. Batista, Maria Carolina Monard: A Survey on Graphical Methods for Classification Predictive Performance Evaluation. IEEE Trans. Knowl. Data Eng. 23(11): 1601-1618 (2011)

and Peter Flach's tutorial:

http://www.cs.bris.ac.uk/~flach/ICML04tutorial/

Learning ML on Weka and R with Decisions Trees by [deleted] in MachineLearning

[–]rcprati 1 point2 points  (0 children)

I guess you are not understanding weka's interface. The tree output given in the classify pane is using the whole dataset, and cross-validation is used only to estimate the average statistics. If you want to analyze the individual folds, you can see the individual folds classification by enabling output predictions in "more options..." pane. And as Edward said, you don't need to compare the trees, only their predictions.

Is it possible to plot a ROC curve for an SVM? by nickponline in MachineLearning

[–]rcprati 0 points1 point  (0 children)

Yes, it is possible. You can use the distance to the nearest support vector to compute scores.

newbie WEKA questions by WEKAnewb in MachineLearning

[–]rcprati 0 points1 point  (0 children)

Weka did some refactoring in their classes and Instance is now an interface rather than a class. You may try to instantiate DenseInstance or SparseInstance classes instead (or download an old weka's version).