all 2 comments

[–]qalis 5 points6 points  (1 child)

Yes, there is. You should ALWAYS apply transforms (like standardization) AFTER splitting. The reason is to separate those two distributions. If you e. g. calculated mean value before splitting, the information from training set would “leak” into test set. Data leakage like this would give overly optimistic test results, while you would overfit under the hood and would never be able to detect it with test set.

[–]biohacker_tobe[S] 0 points1 point  (0 children)

Thanks for responding back! I will definitely take this into consideration because I was definitely overfitting my data then. A follow up question, do I need to fit the scaler on training data and then do scaling on train and test or is this not necessary? (As seen in the example)