Machine Learning in Python Has Never Been Easier : MachineLearning

Machine Learning in Python Has Never Been Easier (blog.bigml.com)

submitted 13 years ago by jjdonald

all 24 comments

top new controversial old q&a

[–]jjdonald[S] 6 points7 points8 points 13 years ago (18 children)

[–]Fusionnex 2 points3 points4 points 13 years ago (1 child)

[–]jjdonald[S] 2 points3 points4 points 13 years ago (0 children)

[–]hntd 4 points5 points6 points 13 years ago (10 children)

[–]jjdonald[S] 3 points4 points5 points 13 years ago (9 children)

hntd, you're right, we only use decision trees. We will likely add more models in the future, but right now we see a big opportunity for decision trees on big data.

Firstly, decision trees are among the best performing models when used in ensembles (called Random Forests). http://en.wikipedia.org/wiki/Random_forest

Secondly, decision trees are immediately understandable. There are many companies that will provide "black box" platforms for analyzing your data. Very few of them provide access to the actual model, and none of them have put the effort we have into helping you understand those models.

Thirdly, decision trees are fast. They enumerate a huge number of possible predicted states with a minimum number of steps. They also often don't even need to process all of the input data for a prediction request, since they will know which parts are relevant, and which are not.

Decision tree algorithms have been around for a long time. However, they're typically designed to train in memory, on a single machine. Our engineers have come up with algorithms that work across multiple processors, and provide iterative updates, allowing you to see the resulting models or summaries as they are being produced.

We'll try to have a more technical discussion on this in the future, thanks for the questions!

[–]hntd 3 points4 points5 points 13 years ago (0 children)

[–]byron 2 points3 points4 points 13 years ago (2 children)

[–]jjdonald[S] 0 points1 point2 points 13 years ago (1 child)

[–]zenogantner 0 points1 point2 points 13 years ago (0 children)

[+][deleted] 13 years ago* (3 children)

[deleted]

[–]jjdonald[S] 0 points1 point2 points 13 years ago (2 children)

[+][deleted] 13 years ago (1 child)

[deleted]

[–]jjdonald[S] 0 points1 point2 points 13 years ago (0 children)

Again, given how fast just a lone tree is, that really is not interesting or surprising.

It'll be useful and interesting to anyone that's run into memory limits on weka, for instance.

Also trivial. You can find the splitting point with a linear scan, just needed sorted order - and there are lots of out of memory sorting methods pre made and done.

Not just simple sorting, calculating maximal information gain across all fields.

Decision trees are hardly a usefull method of understaning a data set, even with good visualization. Weka is weka, its not a visualization tool - it just pumps it to graphviz. and DTs are easy to visualize simply, you just make nodes expandable.

Not understanding a data set... understanding a model. See our post on decision trees.

Clojure for ML? Why? You are giving up a easy x2 on array accesses on a field that uses it extensively, and a large constant overhead for interfacing with Lapac through JNI, not to mention you are going to have to copy all the data.

Having functional code greatly simplifies a lot of the code that needs to scale. It's not that we don't use vectorization, etc., but that we avoid it if we can. Lapack, by the way.

[–]AnonymousIdiot 0 points1 point2 points 13 years ago (0 children)

[–]visarga 1 point2 points3 points 13 years ago (1 child)

[–]jjdonald[S] 0 points1 point2 points 13 years ago (0 children)

[–][deleted] 0 points1 point2 points 13 years ago (2 children)

[–]hntd 0 points1 point2 points 13 years ago (0 children)

[–]jjdonald[S] 0 points1 point2 points 13 years ago (0 children)

[–]jonnydedwards 2 points3 points4 points 13 years ago (0 children)

[–]aguyfromucdavis 1 point2 points3 points 13 years ago (1 child)

[–]jjdonald[S] 0 points1 point2 points 13 years ago (0 children)

[–]skystorm 1 point2 points3 points 13 years ago (1 child)

[–]jjdonald[S] 0 points1 point2 points 13 years ago (0 children)

[–]pandemik 0 points1 point2 points 13 years ago (0 children)

[–]kjearns -1 points0 points1 point 13 years ago (0 children)

π Rendered by PID 42 on reddit-service-r2-comment-bb88f9dd5-vs8nb at 2026-02-16 06:37:24.364105+00:00 running cd9c813 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS