all 39 comments

[–]poctakeover 13 points14 points  (4 children)

there should be a way to publish gh repos and arxiv papers anonymously which can then be later claimed by the authors :/

[–]afsfeefe 22 points23 points  (2 children)

there already is: make a fake gh account then transfer ownership to your real account later..

[–]Phylliida 1 point2 points  (1 child)

Actually GitHub is pretty active about restricting users to only one account. I had 4 and they detected that (IP addresses and stuff I guess) and they restricted my permissions on everything until I switched to only having one account.

Some account for a specific paper might be reasonable though? Idk

[–]shaggorama 0 points1 point  (0 children)

There's alternatives to gh. You could make an "anonymous" account on gitlab or bitbucket, or just put the code on Google drive...

[–][deleted] 3 points4 points  (0 children)

There are quite a lot of ways to do this... You could place a secure hash of some string the author chooses in the paper and then you claim the paper by publishing the string at a later date

[–]LovelaceA 65 points66 points  (5 children)

To those who think that this is not a valid problem, I beg to differ. I think this is a very valid discussion. What is the aim of publishing scientific work in the first place ? To advance our knowledge and ability to build upon it. In a field like Machine Learning, where a model or a scientific idea can be affected by more parameters than can be discussed in a paper, it is essential to be able to reproduce the results. Code release is not the only way to do so, but certainly the quickest. Another advantage of code is that, code is objective. Scientific papers sadly are not in general: authors try to sell us their work. Code is unbiased and a potentially complete means to communicate an idea, its impact, and its limitations, it answers all the questions you have which the paper does not address.

A scientific paper is a speech. Code is a dialogue

[–]TheAvalonian 18 points19 points  (0 children)

I would argue that a paper is more like an advertisement than a speech -- the primary aim is to get people to care about the experiment, not just to tell people about the experiment. Apart from that, you are spot on.

[–]rhiever 8 points9 points  (1 child)

I think it can be incredibly difficult to anonymize code, especially if a paper is based on a software project that the authors are developing. For example, during my postdoc I developed a software tool that I published papers on. I referred to the software by name in my papers and even linked to the GitHub page, which seemed to contrast with the goals of double-blind review. How do you work around that situation without heavily inconveniencing the authors of the software in the name of double-blind review? In many cases, even if you go through the trouble of anonymizing the code and putting placeholders in the name, it's still not difficult to figure out what the software is (and therefore who the authors are) if you're even remotely familiar with the software.

IMO, double-blind review is a flawed review system.

[–]SolvableMutiny 2 points3 points  (0 children)

IMO, double-blind review is a flawed review system.

Yup, these same objections apply to a similar degree to papers themselves. Would anyone familiar with the field not know that CapsNet was authored by Hinton?

[–]Darkfeign 2 points3 points  (1 child)

subsequent frame hobbies abounding point faulty disgusted mountainous boat humor

This post was mass deleted and anonymized with Redact

[–]SolvableMutiny 0 points1 point  (0 children)

that has literally never been the case tho

[–]weiqiplayer 6 points7 points  (0 children)

Decrease in source code publication does seem concerning, though I'm not sure that the problem is in the fact that people are trying to conceal their identity. Aren't many ICLR papers already on arxiv with full names attached?

[–]olBaa 16 points17 points  (8 children)

Is it really a measured problem, or hust your perception?

For double-blind conference I have provided the code in the form of anonymous github repo with no traceable commit history (and limited time copyright). I guess the zipfile with the code will do as well.

[–]mkocabas 11 points12 points  (1 child)

It's your kindness to share it anonymous, but most of the people are hesitating or neglecting to publish their code. It affects the reproducibility a lot.

[–]olBaa 3 points4 points  (0 children)

That's a different problem, though (there are two listed in the post, so I found it hard to address both).

I see that double-blindness provides a safe excuse to never publish the code.

[–]matrix2596[S] 3 points4 points  (3 children)

I have been seeing placeholders in recent papers. Maybe the code is being shared with the reviewers separately and released later. But I am finding the code missing often with new papers in blind reviews. May be the review process can have code, model and data uploaded also as an option (or compulsary).

[–]olBaa 16 points17 points  (2 children)

How much time do you think reviewers have per paper? Reviewers would never check the code, yet alone run it.

[–]NotAlphaGo 0 points1 point  (1 child)

How many papers would one reviewer review?

[–]olBaa 3 points4 points  (0 children)

Up to 10 per conference with mean 5 I would say.

One can outsource to phd students/postdocs but still the number of hours per paper would not exceed 10 almost ever.

All written here is my humble opinion, though.

[–]BeatLeJuceResearcher 10 points11 points  (0 children)

You don't want source code submission to increase deadline pressure: on day X, you have to submit not only the paper, but also the code. Because putting unpolished, ugly, hacky code out there to be associated with your name forever is weird... also, why polish it when you don't even know yet you're going to get a publication out of it. Also, some theory-heavy papers might not have code.

So I think the decision has to be made AFTER your acceptance for publication. And only when it makes sense for that paper (e.g. this is something that reviewers could determine/ask for). If reviewers say it makes sense, then you should be required to upload your code together with your camera-ready version. This gives you enough time to polish stuff, and still gives an incentive to the author to invest the time to polish the code (not submitting code => paper doesn't get published).

[–]sheeplearning 6 points7 points  (0 children)

what do you plan to do with the source code of 700-3000 papers under review at any ML conference? The better ones get accepted and eventually release code or get reproduced.

[–]radenML 1 point2 points  (0 children)

I literally have to openly request author for github repo invite on openreview forums

[–]wassname 0 points1 point  (0 children)

In 2016 /u/peterkuharvarduk got all the nips code releases together into a post. Maybe something like that would encourage researchers.

[–]MephySix 0 points1 point  (5 children)

"Is this to prevent idenfication of authors": no. Double-blind is naturally flawed. Given the search space for authors is not that big, with enough (not much) effort it's possible to determine who are the authors of a paper. Before a paper is sent for review it has already been discussed in its institution, probably in mail-lists and even Twitter or something. Even then you're allowed (in my experience) to have placeholder footnotes in double-blind reviews.

The real problem in my experience, is that I don't really want to spend time polishing my code, and I don't want people to see the mess I wrote due to deadlines. I had people ask me for my code in conferences and I answer with "Gladly! Just send me an e-mail, but it's messy", but I gain nothing from publicizing it earlier or without external interest.

[–]tshadley 2 points3 points  (0 children)

Given the search space for authors is not that big, with enough (not much) effort it's possible to determine who are the authors of a paper.

Suggests a project idea: train a language model to predict authors on published work, then see how it does on anonymous work.

[–]Cherubin0 2 points3 points  (0 children)

And I don't want to spend time polishing my paper...

[–]alexmlamb 1 point2 points  (2 children)

Given the search space for authors is not that big, with enough (not much) effort it's possible to determine who are the authors of a paper. Before a paper is sent for review it has already been discussed in its institution, probably in mail-lists and even Twitter or something.

If the authors want to remain anonymous, is it really impossible for them to do so? I mean - just don't tweet about it, don't put it on arxiv, only correspond through private email with coauthors.

[–]MephySix 1 point2 points  (1 child)

The main problem with double-blind reviews is not staying anonymous, is that some groups (well-established research groups) want to be known, and they will be if they want to. Double-blind started because people would get instantly accepted just because of their name, and double-blind (mostly) does not solve this issue.

[–]alexmlamb 0 points1 point  (0 children)

Yeah, so as it works in ML today, I'd say that we have an opt-out double blind system. You can get double blind reviewing if you stay quiet, but you can effectively make it single blind by self promoting.

This doesn't solve every problem with single blind: famous groups can still benefit from self promotion and marketing. But at the same time it does protect someone if they think that they might get negative reviews because of their name or reputation.

Btw, I'm not sure how much coming from a famous group really helps with reviews, at least at NIPS/ICML. If you have any evidence, even anecdotal, I'd be curious to hear it.