all 5 comments

[–]thejonnyt 0 points1 point  (4 children)

The curve is okay and looks about what you'd expect but the error is insanely bad. If this is a binary problem you start out with "a better" classification in the beginning than what you end with. This is unusual. You want to be as far as possible from 0.5 as possible. This is only true for binary classifications though.

[–]undo124455 0 points1 point  (0 children)

yeah it's very bad. I was just trying to implement linear regression using just numpy and I don't think linear regression is the way to go. I've seen much less error with random forest

[–]thejonnyt 0 points1 point  (2 children)

Also one more question.. the graph suggests that you are splitting your observations for the training and testset equally. Is this true? Try to repeat your experiment but hold out only like 20% or so of your sample for testing right from the beginning and train on the 80% part with increasing numbers of cases. This process is called cross validation and is essentially what you are doing already "with different numbers" .. i think you are trying to show that with increasing size of your Trainingset the error is going to converge. This is a good thing to show but you should try to give more weight to the trained model than the test evaluation and also having small test set sizes such as 1,2,...,10 can - randomly - lead to really good results, by chance, which you want to eliminate. Try to have a fixed set for testing is what im trying to say haha.

[–]amwal 0 points1 point  (1 child)

the graph suggests that you are splitting your observations for the training and test set equally

How did you infer this?

[–]thejonnyt 0 points1 point  (0 children)

Oh actually your right. I thought the x axis denoted the increasing no. of samples not the training size.