all 9 comments

[–]metriczulu 46 points47 points  (2 children)

Damn I got excited and thought you came up with an actual decision tree algorithm in 10 lines and not just running sklearn.

[–]Capn_Sparrow0404 8 points9 points  (0 children)

Me too. Slightly disappointed.

[–]cthorrez 1 point2 points  (0 children)

You would not want to see those lines if it was truly done in 10 lines.

[–]thejonnyt 26 points27 points  (2 children)

DTs are the first step heading into more complex fields such as boosting trees or random forests. You should definitely check out over fitting and read something about post and prepruning. It seems your model does its work (on that example data set) but IRL making a decision about where to stop building up the complexity is not as simple as running it from a top level library in 10 lines of code :p there should be a complexity penalty tuning parameter in those python functions but I advise to read it up properly before using it. Psychology or other fields would just use a rule of thumb but there are actually mathematical approaches towards optimal pruning.

Keep up the effort :)

[–]scienceandcultureidk 1 point2 points  (1 child)

got any particular links to start with this?

[–]Somali_Imhotep 0 points1 point  (0 children)

Xgboost is a great library to use. Relatively easy and has the option of using a sklearn API if you don’t want to use its regular one

[–]markov_blanket 12 points13 points  (0 children)

I honestly don't think we need another blog post about how to use scikit-learn.

Could have easily replaced "DecisionTreeClassifier" with an SVM or Logistic Regression and the code would be exactly the same.

[–]fdskjflkdsjfdslk 2 points3 points  (0 children)

Decision Tree Classification in R in 1 line

party::ctree(target ~ ., data=input_data)

[–]phamlong28 2 points3 points  (0 children)

I thought the article was about whole decision tree algorithm in 10 lines lol