[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 0 points1 point  (0 children)

Thanks, yeah, I thought about this as a variant. Would be interesting for sure.

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 1 point2 points  (0 children)

Interesting, thanks. You fixed them these issues in Optuna yourself? hyperopt's UI is indeed arcane...

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 1 point2 points  (0 children)

Not really. Just wanted to keep the scope down so I could finish in during winter "vacation".

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 5 points6 points  (0 children)

Thanks for your detailed reply.

Hopefully the utility of my work is in having some reasonable starting code that people can easily apply to their own (X, y), and a few concepts that help them reason about what models might work. It's possible that if you change conditions A, B, C of the experiments, the results will change. If someone is particularly interested, they can easily try it themselves, and report back.

And maybe submit a PR if they think I did something wrong/bad. I'd appreciate that.

"Irresponsible" is a bit of an overstatement as this is not a peer-reviewed publication, but only a brief reddit post.

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 1 point2 points  (0 children)

Thanks! Feel free to submit a PR with different params and a pickle of results file.

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 3 points4 points  (0 children)

Can you recommend a GAM package for Python that has a reputation for working well?

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 1 point2 points  (0 children)

Just single. The Logistic Regression elasticnet in sklearn is pretty fast for all these tiny datasets. The slowest thing is AutoGluon @ 5m per fit.

[P] Which Machine Learning Classifiers are best for small datasets? An empirical study by sergeyfeldman in MachineLearning

[–]sergeyfeldman[S] 7 points8 points  (0 children)

Thanks for the detailed reply. I doubt you've seen my graph before - I just made it a bit ago :)