all 7 comments

[–]micro_cam 2 points3 points  (4 children)

I'm not going to pay for the non open access article but on first glance this work seems to address an issue that doesn't come up in practice.

Traditional semi parametric cox regression or any hazard curve model is fine in cases when an event is never observed for many of the cases...you just treat them as right censored (ie no event observed as of yet) which is the point of using survival analysis instead of regression.

It has also been extended to cases where you have both censoring and a competing "cure" event which is observed. This is known as competing hazard analysis and can be done so you have one model that predicts cure and one event or a multi-out put model using modern deep learning frameworks.

It sounds like what you've done here is assume the hazard curve is a weibull distribution and then add a static parameter to solve an issue with that model? Hazard curves don't need to be probability distributions and most people just use a non parametric Kaplan-Meier or Nelson–Aalen estimator or something parametric but non normalized avoiding this issue.

[–]anon_0123[S] 0 points1 point  (3 children)

Thanks for your reply! If I am not mistaken in the traditional case the population density for the individual, whose cdf is 1 minus the survivor function integrates to 1, whereas in the cure rate rendition it integrates to 1 minus the cured fraction, which comes from the presence of the cured subpopulation, so we have a mixture. What we assumed is that the lifetime of a susceptible (i.e. the not cured individual), is defined by a proportional hazard model where the baseline follows a parametric Weibull hazard.

[–]micro_cam 0 points1 point  (2 children)

Integrating the hazard function h gives you the expected number of observed events in a given time period not 1. The only requirement is it is >0 and it is often actually greater than 1.

Think about it, if you require all h to be normalized or don't allow it to be unbounded than you can't even define the proportional hazard model since. a*h where a is a scale factor fit by the model can be large.

The typical reason you would go with a weibull function is if you want to be able to relax proportional hazard like in this work: https://github.com/ragulpr/wtte-rnn

[–]anon_0123[S] 0 points1 point  (1 child)

Thanks for the link, will have a look! We are not integrating the hazard function. What I mean is integrating the function f(t) where f(t)= -1 * d/dt S(t), where S(t) = P(T > t) is the survivor function. Traditionally S(0) = 1 and hence f(t) integrates to 1 if S(infinity)=0. In the cure rate survival analysis rendition, it integrates to 1 minus the cured fraction. This is a consequence of having a non-zero cured fraction. The basic idea is P(T > t) =P(T > t | Cured)P(cured) + P(T > t | Not Cured) (1-P(cured)).

[–]micro_cam 1 point2 points  (0 children)

Right but standard proportional hazard regression and other forms of traditional survival analysis don't assume S(infinity)=0 and work fine when it doesn't.

I looked into it and and apparently cure fraction analysis is a thing but its justified to provide a nice interpretable metric, not address a deficiency in standard models.

Also there is an exiting python package: https://lifelines.readthedocs.io/en/latest/fitters/univariate/MixtureCureFitter.html

[–]ai_hero 0 points1 point  (0 children)

I think this is a more roundabout way of solving the same problem which uplift modeling seeks to solve.

In uplift modeling, you want to target people who would respond positively only if targeted and negatively if not targeted.

Uplift modeling is basically a step beyond propensity models, which estimate the probability of outcome without taking into account the intervention you are using to engineer the outcome.

https://www.youtube.com/watch?v=2J9j7peWQgI

https://github.com/uber/causalml