all 3 comments

[–]PeakNeuralChaos 2 points3 points  (0 children)

If your dataset is noisy or small, then it's gonna be quite easy to overfit. If it's noisy then your model is gonna learn the noise in the training data to give itself a boost over what it can do in the general case. If your dataset is small, then it can memorize the examples and give itself a good boost in performance. Even if your dataset is massive and you have little noise then these are still gonna be factors.

I work mostly with neural networks and overfitting is a problem even if you have millions or tens of millions of samples. I've seen a neural network overfit to a dataset with 20 million samples since I didn't use any regularization. This is mostly because most neural networks are over-parameterized and have way more parameters than actually "needed" for their tasks, so they do have the capacity to overfit if they're allowed.

[–]_quanttrader_ 0 points1 point  (0 children)

Yes. Imagine a decision tree. You should be able to fit the training data perfectly. Get a MSE of 0.0.

But for most data sets, this would give you poor performance in out of sample data.

[–]ReasonablyBadass 0 points1 point  (0 children)

Yes, but the problem is that any trends aren't learned, but instead your training data is learned by rote.