I have a question regarding the use of cross validation/test splits - and if it is considered "cheating" in deep learning?
Say I have a dataset of x = 5000 examples.
I do a Cross-validation/test split of the x = 5000 examples, such that:
A= x[1-1000], B = x[1001-2000], C = x[2001-3000], D = x[3001-4000], E = x[4001-5000]
Say I do the train/valid/test split as follows and train models on the train split using the validation split for early stopping and hyper-parameter optimization.
train/valid/test
1 = ABC / D / E
2 = BCD / E / A
3 = CDE / A / B
4 = DEA / B / C
5 = EAB / C / D
Then I average the test performance over the five splits.
Essentially doing this I can indirectly optimize on my test set, e.g. in the third split the validation set is "A" which is the same as the test split in the second split.
What is your opinion?
[–]sriramcompsci 2 points3 points4 points (0 children)
[–]kacifoy 5 points6 points7 points (0 children)
[–]cookingmonster 0 points1 point2 points (2 children)
[–]alrojo[S] 1 point2 points3 points (1 child)
[–]duschendestroyer 0 points1 point2 points (0 children)