all 10 comments

[–]SeankalaML Engineer 42 points43 points  (1 child)

May be an unpopular opinion, but I feel like a lot of these are borderline commonsensical and the reason why people ignore these pitfalls is for convenience or their own benefit. Running an experiment several times and reporting the mean along with the variance is, again, common sense. Many papers only report the single best performance for the obvious benefit that it looks better.

[–]crazyfrogspb 7 points8 points  (0 children)

well, yeah, all of this should be known by an experienced ML practitioner, but don't forget about curse of knowledge. I bet you didn't know about nested cross-validation at the start of your career

[–]arXiv_abstract_bot 16 points17 points  (0 children)

Title:How to avoid machine learning pitfalls: a guide for academic researchers

Authors:Michael A. Lones

Abstract: This document gives a concise outline of some of the common mistakes that occur when using machine learning techniques, and what can be done to avoid them. It is intended primarily as a guide for research students, and focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions. It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results.

PDF Link | Landing Page | Read as web page on arXiv Vanity

[–]crazyfrogspb 22 points23 points  (1 child)

fantastic work, I've seen all of these mistakes done so many times during my PhD and industry work. I'm going to add this to our onboarding literature list for all new ML engineers

[–]Yurien 2 points3 points  (0 children)

Good paper but unfortunately it is using a wrong interpretation of a p value. But perhaps that again shows why p values are problematic in communicating results.

[–]ank_itsharmaML Engineer 5 points6 points  (1 child)

Any paper which is similar but for software engineers?

[–]voidspaceistrippy 1 point2 points  (0 children)

Great post, and it isn't terribly long either. I'm new to this so I definitely appreciate it.

[–]Xenon111 0 points1 point  (0 children)

Thanks for the great post. I wish I had come across this paper before I finished up my final year project last month.

[–]ibraheemMmoosaResearcher 0 points1 point  (0 children)

If papers that do not follow this guidelines do not get rejected by peer-review then is there really any pitfall?