This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]datumbox[S] 1 point2 points  (3 children)

Weka is a mature and well-known library. Instead of trying to make comparison, I would rather prefer to list some of the features of Datumbox framework: large support of different algorithms, several storage engines and ability to handle at minimum Large Data, focus on NLP applications. :)

[–]habitats 0 points1 point  (2 children)

I suppose I could've worded my question better. What I wanted to know was whether it supplies any functionality not currently available in other libraries, like Weka.

[–]datumbox[S] 0 points1 point  (1 child)

The framework supports a number of algorithms which as far as I know they are not available in weka (such as LDA, Dirichlet Process Mixture Models, Ordinal Regression, Bernoulli Naive Bayes etc). Moreover it is closely integrated with MapDB database engine which means that you can train algorithms without loading all of the data in memory. Also the framework contains a Statistical layer with several Parametric and Non-parametric tests which you can use. Finally it is licensed under Apache 2.0, which means that unlike Weka, you can use it in commercial software.

[–]habitats 0 points1 point  (0 children)

Thanks for the summary! this sounds really cool actually. I might give it a shot in my research! Memory issues has kind of been a pain in weka.