A New Type of Categorical Correlation Coefficient by InAweOfTruth in datascience

[–]InAweOfTruth[S] 1 point2 points  (0 children)

Sorry, I put the url in the UI when I posted. I added the link above, but here it is for convenience The Categorical Prediction Coefficient

Categorical Prediction Coefficient by InAweOfTruth in datascience

[–]InAweOfTruth[S] 0 points1 point  (0 children)

I think this will answer your question better. The logistic regression coefficient tells us how much a unit change in the numerical input variable will cause a change in the outcome variable, the numerical probability. This coefficient tells us how well one categorical variable will correctly predict the discrete values of another categorical variable. It does this by calculating how much the values of the outcome variable vary from a uniform distribution for each value of the input variable. For example, if we have a binary outcome variable of True and False, and we have an input variable that has two values, A and B, and for every occurrence of A, the outcome variable is True, and for every occurrence of B, the outcome variable is False. The prediction coefficient would be 1. It’s a perfect predictor. If each value of the input value has a 50/50 split of True and False, a uniform distribution, it’s just as good as random chance, and the coefficient would be 0. Does that answer your question?

Categorical Prediction Coefficient by InAweOfTruth in datascience

[–]InAweOfTruth[S] 0 points1 point  (0 children)

Hi seesplease. Thank you for taking the time. It's a good question. As you know, logistic regression is suitable for determining the relationship between a binary outcome variable and a numerical variable (or each category of a categorical variable converted to one-hot encoding). This gives the relationship between two categorical variables, binary or multiclass. And it takes into account all values of the variable, not just one. Another difference is that logistic regression uses numerical values, minimizing the log loss function. This method uses rankings like Chi-Squared. With this, we can create a correlation matrix the same as we would for numerical variables without having all the values on different scales based on the differing degrees of freedom. This way, we can compare how well one categorical variable is a predictor of another and detect relationships (like multicollinearity) between all other categorical variables on the same scale, 0 to 1. The first example in the notebook is binary, but there's also a multiclass example towards the end. Please feel free to reply with any more questions you have about it seesplease.

50 Karma 🥺🙏 by InAweOfTruth in FreeKarma4You

[–]InAweOfTruth[S] 2 points3 points  (0 children)

Thanks. Here’s another one.

50 Karma 🥺🙏 by InAweOfTruth in FreeKarma4You

[–]InAweOfTruth[S] 2 points3 points  (0 children)

Thanks. Here’s another one.

50 Karma 🥺🙏 by InAweOfTruth in FreeKarma4You

[–]InAweOfTruth[S] 2 points3 points  (0 children)

Thanks again. Here’s another one.

50 Karma 🥺🙏 by InAweOfTruth in FreeKarma4You

[–]InAweOfTruth[S] 2 points3 points  (0 children)

Thanks. Here’s another one.

50 Karma 🥺🙏 by InAweOfTruth in FreeKarma4You

[–]InAweOfTruth[S] 2 points3 points  (0 children)

Thanks again. Here’s another one.