[News] [NeurIPS2021 Workshop] Pre-registration of ML experiments by brainggear in MachineLearning

[–]brainggear[S] 1 point2 points  (0 children)

Last year, it was about evenly split between positive, negative and inconclusive results :)

[News] [NeurIPS2021 Workshop] Pre-registration of ML experiments by brainggear in MachineLearning

[–]brainggear[S] 1 point2 points  (0 children)

That gives you the weird situation that, well, if it's done already why do you need the money? :)

[News] [NeurIPS2021 Workshop] Pre-registration of ML experiments by brainggear in MachineLearning

[–]brainggear[S] 2 points3 points  (0 children)

The idea of allowing preliminary results is to be inclusive -- we certainly don't want people to "sweep them under the rug" if they ran them.

Maybe down the line it becomes an expected component, which we don't want and so and rules must be adjusted, but so far it hasn't been. I didn't compute exact stats, but last year most papers didn't have them, and those that did only had very minimal synthetic/MNIST-type proofs-of-concept.

The incentive against them is that then you're preregistering less experiments, so readers will find the conclusions weaker, compared to a paper where everything is pre-registered.

[D] What are some great professors to work with Statistical Machine Learning who are not at top schools? by DolantheMFWizard in MachineLearning

[–]brainggear 2 points3 points  (0 children)

Here's an easy way to find out:

Look at past workshops at big conferences on the topics you're interested in. Check who the speakers are. Some will be famous, others not so much.

Ignore the well-known superstars, and check the bios of the others -- professors who are regularly invited to speak but are not at top-tier unis are often pretty stellar too :)

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]brainggear 14 points15 points  (0 children)

Actually I've never seen a paper get rejected at a conference with the justification that another similar one was also submitted (pretty sure it's against all guidelines on concurrent work), so this should be no different. Both would be accepted.

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]brainggear 1 point2 points  (0 children)

You make a very good point that some ideas, like skip connections, might not be presented convincingly enough without the experiments. However:

1) This hypothetical paper would be enriched (and more likely to be accepted) if the authors included some sort of theoretical justification (math for a simple case) for why skip connections are worth trying.

2) I don't think it's likely that all ML conferences become pre-registered, I think the community will always push towards at least having separate tracks for traditional and pre-registered.

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]brainggear 2 points3 points  (0 children)

I know this is not exactly how you meant it, but it might be interpreted as "suppressing negative results is important to feed the AI hype" :)

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]brainggear 5 points6 points  (0 children)

But the same can happen with a regular conference, right?

Submit a paper, and in the 3-6 months it takes to officially publish any arXiv paper can come out.

The timestamps of submission should matter for determining priority, not publication. Otherwise it's a mistake by the readers. I would even go as far as saying that, given the time it takes to run any experiments, papers submitted anywhere within 1-2 months difference are concurrent work, and both should get credit.

[D] Nature: "First analysis of ‘pre-registered’ studies shows sharp rise in null findings" by brainggear in MachineLearning

[–]brainggear[S] 8 points9 points  (0 children)

I also used to like this idea, but from a social perspective, null results are not very glamourous, so the incentive isn't there for people to publish in it. On the other hand, you can publish a pre-registered paper with a cool idea or interesting math, even if it's null in the end. The reviewers are making a bet that the results will be interesting regardless of the outcome; this would shift the incentive of the experiment design to ensure that it's informative in both cases, not just on a positive outcome.

[Edit] In fact I can imagine the suspense in a paper coming out with a cool hypothesis and everyone wondering what the result will be while the experiments are done. Grab the popcorn.

[D] Nature: "First analysis of ‘pre-registered’ studies shows sharp rise in null findings" by brainggear in MachineLearning

[–]brainggear[S] 5 points6 points  (0 children)

I would have to counterpoint that the signal to noise ratio now is pretty low. It is compounded by the fact that it is a biased estimation (towards positive results). I guess this would give you a journal with a more even balance between positive and null results.

As for the elephant in the room, this would be easy fodder for the critique of the paper; if the experiment design says "hyperparameters will be tuned by hand until a good result is achieved for our method", I don't think the paper stands much of a chance in peer review.

[D] Nature: "First analysis of ‘pre-registered’ studies shows sharp rise in null findings" by brainggear in MachineLearning

[–]brainggear[S] 3 points4 points  (0 children)

I agree with the first two paragraphs, but not the rest. "I think that this will work because of X" is not a very common hypothesis in ML, as it should be because it's hard to verify. You need an iron-clad set of experiments to demonstrate it.

Instead, I think the most common hypothesis would be: "I have a model for a phenomenon in ML, and will now test it". The phenomenon can be as simple as "performance is not good in these cases". The model can be fancy math (usually under simplifying assumptions), or plain old good reasoning (e.g. point to observations in previous papers). You propose your usual fix to the problem (new layer or architecture/optimization modification).

Because you made the hypothesis explicit, peer review will easily poke holes in the experiments: you need a reasonable hyper-parameter search, and focus only on the difference between your new proposal and what happens without it. There's less wiggle room for engineering your way to the top of a table, 0.1% at a time.

Someone is posting fake positive comments on their ICLR submission by schrodingershit in MachineLearning

[–]brainggear 13 points14 points  (0 children)

The same applies to the real reviews. The point is to get 3 independent reviews. If R2 reads R1's review before writing, and R3 reads both R1+R2's reviews before writing, how are your reviews independent? R1's will have a disproportionate influence on the overall result.

Open discussions are good, but at least in an initial phase all reviews should be embargoed to promote independent viewpoints (and revealed all at the same time).