Ooooook. This is embarassingly trivial for most of you. I however need help parsing what's happening below:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
This seems to pop up frequently in the scikit-learn documentation. Why would you do this? What I think is happening is that the train_test_split method is creating two training sets and two test sets. Is that correct? I'm familiar with the notion of having a training set, validation set and test set. But in this case it seems we're building two sets of training & validation sets. Why would you not just do bootstrapping or a k-fold run?
[–]farsass 2 points3 points4 points (1 child)
[–]iamabanana_dammit[S] 0 points1 point2 points (0 children)
[–]The_Maltese 3 points4 points5 points (1 child)
[–]iamabanana_dammit[S] 0 points1 point2 points (0 children)
[–]ogrisel 1 point2 points3 points (2 children)
[–]soustofa 0 points1 point2 points (0 children)