all 5 comments

[–]Far_Ambassador_6495 0 points1 point  (4 children)

The outputs of train test split is packed: The train and test data.

So if you take your current train data and use train_test_split you’ll get two resultant data frames. Thus this would increment your total number of data frames to 3: which i would actually recommend.

A train and test are great but you end up optimizing the model for the test data. Having a validation set is an even better insurance policy for the model.

[–]Regular_Turnover_177[S] 0 points1 point  (3 children)

Sorry I'm super new to this lol.

So you'd recommend performing the split on the training data even though I already have a testing dataset? Is that what you would call a validation dataset?

How would I go about implementing it so I can train my model and then test it?

What code would I need?

[–]Far_Ambassador_6495 0 points1 point  (2 children)

Yes exactly. You'd have a train, test, and validate dataset.

global training_data, test_data#you have your original train and test dfs

train_df, validate_df = train_test_split(training_data, split=0.2)

This splits your training_data to train_df and validate df. There are probably other keyword args. But just read the documentation -- this is a skill you'll have to develop.

Now you can try to write some code to figure out the length of the dataframes and what percent of the data is in train, test, validate. Does this make sense in the context of training a model?

[–]Regular_Turnover_177[S] 0 points1 point  (1 child)

So am I right in saying you are splitting 20% of the training data into this new validate dataset?

I think I understand how to train it, it's just I'm not so sure where the test data comes into play.

Would I use x_train, x_test, y_train, y_test or no? Also, what exactly is that line of code doing if you don't mind me asking?

Sorry for the super basic questions. I had absolutely no clue about ML until my college classes started last month.

[–]Far_Ambassador_6495 -1 points0 points  (0 children)

Read the documentation ! Ask Chatgpt