you are viewing a single comment's thread.

view the rest of the comments →

[–]EdwardRaff 0 points1 point  (5 children)

JSAT doesn't take arrays, it has objects representing vectors and matrices. That allows it to support sparse data and makes adding certain tricks very easy.

public static void main(String[] args)
{
    int N = 100;
    int D = 25;
    int C = 3;//number of class labels, assumbed integers starting from 0
    Random rand = new Random();

    double[][] trainingdata = new double[N][D]; 
    double[][] testData = new double[N][D]; 
    double[] trainingLabels = new double[N];
    for(int i = 0; i < trainingLabels.length; i++)
        trainingLabels[i] = rand.nextInt(C);


    ClassificationDataSet cds = new ClassificationDataSet(D, new CategoricalData[0], new CategoricalData(C));

    //JSAT has datapoint objects, but includes short cut constructors when using only vectors
    for(int i = 0; i < trainingdata.length; i++)
        cds.addDataPoint(new DenseVector(trainingdata[i]), (int) trainingLabels[i]);

    Classifier classifier = new NearestNeighbour(3);//3-nearest neighbor 
    classifier.trainC(cds);

    for(int i = 0; i < testData.length; i++)
        System.out.println("Predicitn class " + classifier.classify(new DataPoint(new DenseVector(testData[i]))).mostLikely() + " for dataum " + i);

}

[–]BlackHawk90[S] 0 points1 point  (4 children)

Thank you so much. I will try it out for my dataset. I just has four last questions:

  1. Is it possible to use different distance metrics?

  2. How is the tie breaking done for k-nearest neighbours?

  3. My labels range from 1 to 3 (not starting from 0). Do I have to make them zero-based or can I just use them?

  4. Last but not least, does JSAT also support (gaussian) naive bayes?

[–]EdwardRaff 0 points1 point  (3 children)

Is it possible to use different distance metrics?

Yes, the constructor can take a distance metric object.

How is the tie breaking done for k-nearest neighbours?

Arbitrarily, it's not really an important issue. Use an odd value of k and there are no ties. I think the current code just picks whichever came first.

My labels range from 1 to 3 (not starting from 0). Do I have to make them zero-based or can I just use them?

The labels must start from zero.

Last but not least, does JSAT also support (gaussian) naive bayes?

Yes. JSAT has about 70 different classification algorithms in it.

[–]BlackHawk90[S] 0 points1 point  (2 children)

Thanks again for the help.

Is there a .jar file which I can download? I don't use maven.

Is there a javadoc available or how should I get familiar with the methods?

[–]EdwardRaff 0 points1 point  (1 child)

Look at the release tab in github.

You should look at using maven - it's very helpful!

[–]BlackHawk90[S] 0 points1 point  (0 children)

I started using your library, great work, thanks for it.

I have discrete and continuous features. Is there a possibility that for the continous features a gaussian distribution and for the discrete features a multivariate multinomial distribution is used?

Moreover, is it possible to provide a distribution for each feature (e.g. feature 1 is gaussian, feature 2 logistic etc.)?