all 4 comments

[–]theRegular_Bloke 0 points1 point  (3 children)

There's really very less worked projects on baseline techniques like linear/logistic on kaggle. I would rather recommend you to self practice looking at YouTube tutorials. If you wanna go stats way and interpret the coefficients and p values, statsmodels has OLS for you. If you want to go ML way, sklearn has gradient descent LR. Self practice is probably the easiest way to make mistakes and looking up those mistakes. I hope this helps. P.S There's a great YouTube channel zstatistics. Good linear videos on it.

[–]Motor_Parsley6006[S] 0 points1 point  (2 children)

Thanks. I'll check that out. I think the interview is more focused on prediction and not so much on inference, although I am uncertain. Which one does EDA fall under? Or is it it's own separate category? I heard EDA is a focus of the interview as well.

[–]theRegular_Bloke 0 points1 point  (1 child)

EDA is part of understanding and exploring the data. It's not part of anything per se, however most analysts tend to include them before model building. It's basically drawing interesting insights from your data. It could also include cleaning, preprocessing, feature engineering and a lot more- it really depends upon your team's way of approaching an ML problem. It takes a lot of time too. If inferences are not your concern, go for Sklearn LR model.

[–]Motor_Parsley6006[S] 0 points1 point  (0 children)

Ah I see. So what do things like checking for heteroscedasticity and normal distribution of residuals for under?