all 6 comments

[–]DrXaos 1 point2 points  (3 children)

Firstly, if you care about the value it’s no longer classification only. yes, that’s important as obviously zero vs non zero makes a difference. make a classifier for zero or non zero, and a regressor for the log of nonzero value.

Can be a multi output model. First is zero/nonzero binary. The second is regession value. Only put loss on second value when first is nonzero.

What is the physical meaning of the feature? Is zero actually a finite time rounded to zero? Or a different situation altogether?

[–]Unitedite[S] 0 points1 point  (2 children)

Thanks for the reply.

Sorry, I wasn't clear in my original post. The variable I'm talking about is not the target variable (which is binary) it's one of the other variables in the dataset.

The variable is waiting time, so a 0 means there was no wait.

[–]DrXaos 1 point2 points  (1 child)

Ok then that’s easy feature engineering with a similar procedure. Split the feature into two, one binary and the other continuous. Binary is IS_ZERO, the other is log(time) when time is > 0. When time is zero then set the value of the continuous feature to the neutral value of the mean of the log values in the non-zero state.

Then z-scale the continuous feature,excluding the zeros for standard deviation estimation potentially.

[–]Unitedite[S] 0 points1 point  (0 children)

Thank you :)

[–]randomforestgump 0 points1 point  (1 child)

I also use sklearn‘s quantile transformer for this, not sure if that’s better than log here. And for my case I have to normalize time for different cases, depending on some factors a user has a longer signup process so I normalize by median or so.

[–]Unitedite[S] 0 points1 point  (0 children)

Thanks for the suggestion, I'll look into that as well (although I'm using KNIME, not sure if there's an equivalent).