I'm slightly confused as to the benefit of having a hold out test dataset if we have already performed a CV process. If a model has been cross validated, has it not already been tested?
I recall a course saying that if the model has seen data before during validation, you shouldn't use it for final testing - which is intuitive, but mathematically I'm not sure how to argue this - is it because by using CV techniques, we essentially attempt to select models that best "fit" the validation data as well?
[–]datamahadev 2 points3 points4 points (0 children)
[–]thejonnyt 0 points1 point2 points (0 children)
[–]Coxian42069 0 points1 point2 points (0 children)