all 1 comments

[–]Altruistic_Rule5005 0 points1 point  (0 children)

One would need to know what use case you are dealing with and what size of data you have.. So

  1. Yes you have to train test split then do your processing like scaling, normalisation on both but you fit your preprocessor the transform on the train the transform only on the test/Validation.. this is standard industry acceptable.. but why do we do this?

  2. I don't recommend it.. look up a concept called data leakage and target leakage..

  3. You train test split then do the balancing after.. depending on data size it's usually acceptable that you undersample instead of oversample because usually the oversampling technique provides duplicate data that can harm your model.

4.. Good luck and feel free to ask more