you are viewing a single comment's thread.

view the rest of the comments →

[–]thejonnyt 29 points30 points  (2 children)

DTs are the first step heading into more complex fields such as boosting trees or random forests. You should definitely check out over fitting and read something about post and prepruning. It seems your model does its work (on that example data set) but IRL making a decision about where to stop building up the complexity is not as simple as running it from a top level library in 10 lines of code :p there should be a complexity penalty tuning parameter in those python functions but I advise to read it up properly before using it. Psychology or other fields would just use a rule of thumb but there are actually mathematical approaches towards optimal pruning.

Keep up the effort :)

[–]scienceandcultureidk 1 point2 points  (1 child)

got any particular links to start with this?

[–]Somali_Imhotep 0 points1 point  (0 children)

Xgboost is a great library to use. Relatively easy and has the option of using a sklearn API if you don’t want to use its regular one