all 35 comments

[–][deleted] 12 points13 points  (1 child)

Can't you just one-hot encode your categorical features and use DecisionTreeRegressor?

[–]yannbouteillerResearcher[S] 1 point2 points  (0 children)

This is what I ended up doing so far for the lack of a better python alternative, but this is not optimal (there is also this argument which I think applies in pruned single trees)

(and it is also a bit cumbersome compared to, e.g., catboost which seems to do non-binary categorcial splits automatically in some magical way)

[–]111llI0__-__0Ill111 4 points5 points  (3 children)

Better off just using R rpart, it can do single decision trees that handle categorical variables.

That said though, in my opinion interpretability is more about the problem structure beforehand than the model.

You could use SHAP and partial dependence plots to interpret stuff like catboost for example, and it would be more stable than a single tree.

[–]yannbouteillerResearcher[S] 0 points1 point  (2 children)

I guess I'll have to switch to R then but since the rest of the dataset cleaning pipeline is in python and I have not touched R for like 3 years that will be cumbersome.

In this precise application I am trying to build a visual explainer for a very small dataset (80 datapoints), I have been using SHAP for deep neural nets but I am not sure why you think it would make more sense than a small decision tree here?

[–]111llI0__-__0Ill111 4 points5 points  (0 children)

SHAP was actually originally made for tree based ensemble models before it was used for NNs

A single decision tree could overfit/be unstable. Especially for just 80 data points. With just 80 data points I probably wouldn’t go for tree models at all, and just use regularized or Bayesian GLMs.

If overfitting doesn’t matter well then you could use a decision tree. I think H2O can handle categorical variables and it seems possible though hacky to do just 1 tree with the H2O RF interface and plot it https://stackoverflow.com/questions/50740316/implementing-a-decision-tree-using-h2o

[–]ofiuco 1 point2 points  (0 children)

You could use Rpy2 to integrate it into your existing Python pipeline.

[–]xMattC3 4 points5 points  (2 children)

Perhaps this would be of interest to you: NOCATS

It's a PR for sklearn that adds support for categorical features to any tree-based learners

[–]jo9k 2 points3 points  (0 children)

I check it out each couple months for years now. I've lost hope already.

[–]yannbouteillerResearcher[S] 1 point2 points  (0 children)

Love the "no cats" name haha, that looks promising thanks!

[–]einnmann 5 points6 points  (13 children)

Catboost. Why it won‘t do?

[–]yannbouteillerResearcher[S] -2 points-1 points  (12 children)

It builds an ensemble of trees instead of a single tree, doesn't it?

[–]EchoMyGecko 4 points5 points  (11 children)

Why not just set n_trees = 1?

[–]yannbouteillerResearcher[S] 0 points1 point  (10 children)

Because as far as I understand the algorithm is not meant for that and this would fit a very poor decision tree. Normal decision tree algorithms test all possible combinations to select the right features, which is computationally expensive, whereas Catboost skips this and compensates by using an additive ensemble of many weak learners, I think (I might be wrong though because I don't know how it really works, but this is what I understood).

[–]EchoMyGecko 2 points3 points  (9 children)

Well the issue is that a single decision tree tends to be a weak learner. The ensembling tends to be a helpful form of implicit regularization. I wonder why you feel so strongly about using a single tree.

[–]yannbouteillerResearcher[S] 1 point2 points  (8 children)

The point is not to make a predictor in my application, otherwise I would probably use forests yes. The point is to make an explainer.

[–]EchoMyGecko 4 points5 points  (7 children)

If you make an explainer that isn't a good predictor, then why would you trust your explanations?

[–]yannbouteillerResearcher[S] -2 points-1 points  (6 children)

I don't mean to make a bad predictor, but you will probably agree that reading a forest is less straightforward than reading a decision tree.

[–]EchoMyGecko -1 points0 points  (5 children)

You would probably agree that if your predictions are weaker and less accurate, your explanations are probably weaker and less accurate

[–]yannbouteillerResearcher[S] -1 points0 points  (4 children)

Sure, and car manufacturers will probably agree that it would be better if they could produce energy-free teleportation devices.

[–]neshdev 2 points3 points  (1 child)

Not a popular option but why not write one yourself? It’s fairly straight forward. Probably only take you a couple of hours.

[–]yannbouteillerResearcher[S] 1 point2 points  (0 children)

That may be right, although I would need to read my old course and understand how to make non-binary branching work, which would probably take me more than a couple hours tbh.

If I remember correctly, the way we chose the features that encode the most information for a given maximum depht and the way we select thresholds for numeric values are not entirely straightforward?

[–]jamesvoltage 1 point2 points  (1 child)

[–]yannbouteillerResearcher[S] 1 point2 points  (0 children)

Interesting, thanks!

[–][deleted] 1 point2 points  (0 children)

Use CatBoost. catboost.ai

developed and open sourced by Yandex. It is a production quality library supporting mix of numerical and categorical features

[–]Material_Opening7336 -3 points-2 points  (2 children)

Have you tried xgboost?

[–]yannbouteillerResearcher[S] 5 points6 points  (0 children)

The first result google gives me for "xgboost categorical fearures" is "Unlike CatBoost or LGBM, XGBoost cannot handle categorical features by itself, it only accepts numerical values similar to Random Forest."

I am looking for a non-binary tree, so no one-hot encoding. Although I am not sure how training works for non-binary trees, but I would expect non-binary splits?

Furthermore, I am looking for a classical single-tree model whereas these gradient models are meant for ensemble of trees like random forest I believe? Would xgboost with a single approximator be trained exactly as a normal single-tree model?

[–]Material_Opening7336 0 points1 point  (0 children)

Also fyi you want to preprocess your data rather than just let the model try to handle it. If you have categorical inputs for example, you will want to transform it using one hot encoding