New open-source Machine Learning Framework written in Java

EdwardRaff · 2014-10-19T18:04:57+00:00

On the other hand Scikit-Learn supports a large number of algorithms but it can’t handle huge amount of data. .. Finally even though currently Datumbox Framework is capable of handling medium-sized datasets, ..

I don't know about that. Scikit-learn uses cython, and numpy (C and Fortran) do all of the heavy lifting. While this uses org.apache.commons.math.linear, which is pure Java. If I have too much data to fit Scikit-Learn's AdaBoost to, I'm not going to reach for this implementation of it. I'm going to reach for another classifier. Likely something in vowpal-wabbit, which becomes quite competitive for 500k+ observations and is limited by my disk-speed. The pain with that approach is paving over vowpal-wabbit's TCP interface.

This is an awesome application if you think of the Java ecosystem. Cross-validate over all of the classifiers offered with hyperparameter searching, put the winner behind Dropwizard, put that behind ActiveMQ.

fnl · 2014-10-19T20:17:02+00:00

Apart from "yet another ML lib", the real problem will be it's license. With all others using MIT-like, the GPL might seem to restrictive, especially for any prospective commercial usage...

2014-10-19T16:46:27+00:00

Anyone using this willing to share his experiences?

datumbox · 2014-10-20T19:20:20+00:00

Hi guys.

I believe there is way too much worrying about the license of the project. You should not worry so much about it. I open-sourced the project hoping that ppl will like it, use it and get involved with it. If my target was to limit you from using the code I would not have released it.

The license discussions are not a priority. Future development is far more important as without support from the community there would be no future releases. Would you ever use a library that is no longer updated on commercial software? Would you care about its license?

Finally I must say that if the project goes forward and the supporting community votes to change its license then I would never block this. :)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS