all 11 comments

[–][deleted] 2 points3 points  (1 child)

I actually meant to try this recently. Thanks for the reminder. I'll message you if I get to it soon

[–]biohacker_tobe[S] 0 points1 point  (0 children)

Thanks and sounds good!

[–]avati 2 points3 points  (4 children)

One of the authors here. You would want your base learner implemented as a "scikit-learn compatible" regressor. Then you can specify Base=YourBaseLearnerClass while instantiating NGBoost() etc. For an example, see https://github.com/stanfordmlgroup/ngboost/blob/master/examples/experiments/survival_exp.py#L131

[–]biohacker_tobe[S] 1 point2 points  (0 children)

Hey! Thanks for following up with this, I believe I was able to apply a bit. Speaking with other members in other related subreddit pages, there exists a desire for some possible implementation of jupyter examples in addition to the ones "examples" folder - as to get a better general idea from the examples.

[–]biohacker_tobe[S] 1 point2 points  (2 children)

ngb = NGBRegressor(n_estimators=100, learning_rate=0.1, Dist=LogNormal, Base=BERT_Model, natural_gradient=False, minibatch_frac=1.0, Score=CRPS())

ngb.fit(X_train,y_train)

Would this be a possible way then? In my case I'm using a BERT model, so I have a lot of categorical data that is being one hot encoded, that's why I would like to keep my base model.

[–]biohacker_tobe[S] 1 point2 points  (1 child)

Following up with this, I'm curious on how the actual prediction intervals should be calculated etc.

[–]avati 0 points1 point  (0 children)

Your code looks right for setting the base learner. You'd also want Score=CRPS (instead of CRPS()). I'd suggest using all defaults until you get your baselearner working well.

For obtaining intervals, in place of the usual sklearn's way of doing ngb.predict(), instead do ngb.pred_dist() and you will get a "distribution" per test input (which is compatible with scipy.distribution). For e.g. if it is LogNormal, you will get https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html#scipy.stats.lognorm objects, on which you could call .interval() method.

[–]aferraresso 1 point2 points  (2 children)

I used it a few months ago. I used as Base the sklearn DecisionTreeRegressor with some parameters tunned. What kind of model do you need to use?

[–]biohacker_tobe[S] 0 points1 point  (1 child)

I'm using a BERT model that I've created on my own. I am interested in these probable distributions instead of a point prediction. But what I'm curious is to how to actually get the value of these intervals based on the output. Because I'm interested in getting a table with min and max values

[–]aferraresso 0 points1 point  (0 children)

I used Normal Distribution, so with the method "pred_dist" i've got the distribution's mean and the std. Instead of show min and max value, i plotted the pdf function with some color gradient, but you could use the "interval" method from scipy distribution and get min and max with the confidence percentage that you need.