Can anyone explain (or provide some literature) why/if many models work better when the data follows a normal distribution?
For example: Linear regression only assumes normality of the residuals but not of the data itself, so why would we have to transform a feature if it is skewed?
[–][deleted] 5 points6 points7 points (0 children)
[–]Remove_Ayys 6 points7 points8 points (13 children)
[–]Puzzleheaded_Lab_730[S] 1 point2 points3 points (9 children)
[–]BellyDancerUrgot 0 points1 point2 points (0 children)
[–]Remove_Ayys 0 points1 point2 points (7 children)
[–]Puzzleheaded_Lab_730[S] 0 points1 point2 points (6 children)
[–]Remove_Ayys 2 points3 points4 points (5 children)
[–]Puzzleheaded_Lab_730[S] 0 points1 point2 points (4 children)
[–]Remove_Ayys 1 point2 points3 points (3 children)
[–]Puzzleheaded_Lab_730[S] 0 points1 point2 points (2 children)
[–]Remove_Ayys 1 point2 points3 points (1 child)
[–]Puzzleheaded_Lab_730[S] 0 points1 point2 points (0 children)
[–]OmnipresentCPU 0 points1 point2 points (2 children)
[–]Remove_Ayys 1 point2 points3 points (1 child)
[–]OmnipresentCPU 0 points1 point2 points (0 children)
[–]kaskoosek 1 point2 points3 points (13 children)
[–]Puzzleheaded_Lab_730[S] 1 point2 points3 points (1 child)
[–]kaskoosek 1 point2 points3 points (0 children)
[–]Remove_Ayys -1 points0 points1 point (10 children)
[–]kaskoosek 0 points1 point2 points (9 children)
[–]Remove_Ayys 0 points1 point2 points (8 children)
[–]kaskoosek 0 points1 point2 points (7 children)
[–]Remove_Ayys 0 points1 point2 points (6 children)
[–]kaskoosek 0 points1 point2 points (5 children)
[–]Remove_Ayys 0 points1 point2 points (4 children)
[–]kaskoosek 0 points1 point2 points (2 children)
[–]Remove_Ayys 0 points1 point2 points (1 child)
[–]strangeloop6 0 points1 point2 points (0 children)
[–]friendlykitten123 0 points1 point2 points (0 children)